"Visualizing Sound'

Sign in to queue

Description

Today's project by Philip R. Braica (aka HoshiKata) is one that's a little unusual, both in what we usually highlight here and in of itself. I thought it kind of cool and neat, plus educational too...

Visualizing Sound

image

Introduction

I always wanted to do some basic sound processing and to be able to visualize the sound, see what speech looks like as a frequency spread, or how music looks.

In order to that, several simple problems need to be solved: 

  • DirectX sound playback -> raw values
  • DirectX listen -> raw values
  • Values -> FFT
  • Graph of an FFT
  • Way to see how the FFT evolves over time.

The next part goes into a good bit of details, information and math (which hurts my brain on a Saturday... Wink

Background 

Demystifying Sound and Frequency Processing

When processing sound, the data is usually compressed but at some point it becomes just a set of samples in time, or more simply as speaker position over time.

y(t) = F(t)

A frequency is a vibration back and forth at a given rate, so:

  • When does a frequency start?  
  • When  does it end?

The answer is a frequency occurs over a period of time, therefore it has a period of time or "support".

To measure it, you compare it against a known frequency, such that the time over which the comparison is made is important. If the time span is too short, only a few digitized samples are recorded and only a few possible frequencies can be measured. If it is too long, the sense of when the frequency or musical note started and ended is lost.

To provide more background I'll add a bit more math: suppose we hear a chord (3 notes) in the sound, then we could write the generating function as three sine waves:

image

Next DirectX and FFT Visualizing;

Buffering

There are numerous articles that describe how DirectX can playback or listen to a microphone. The code provides a decent example of how to do that. What is interesting is how threading is used in this example. Data is pulled from the file, then the buffer is added to a list of buffers to work on. A Thread then pulls off data from the arrays and performs as many FFT's as needed.

For example, the data arrived event produces 44,000 samples, then 22,000 samples. This would result in two buffers pushed off to the work queue (44k, then 22k). The FFT size is 1024. The worker thread can then asynchronously process bunches of data, then publish the results (an event) so it can be viewed.

Overlapped FFT

An FFT converts time domain data into frequency domain data, so if the rhythm starts or begins around the edge of one data set or another, the edge of it starting will cause some ringing. To avoid this and create a better sense of "locality": establish the "when" of a given frequency starting or stopping, the data set is broken into overlapped regions that can be FFT'd. If there was 22k data points, and the FFT size was 1024, overlapped into forths, the first FFT starts at 0, then at 1024/4, then 1024/2, 1024*3/4, 1024, 1024 + 1024/4, ...

image

Finally he delves into the code;

Using the code

There are a few cool things in the code provided:

  • DirectX sound capture, and playback, an FFT, and some graphics widgets for future use.
  • Each is more or less standalone and usable in a future product,
  • Code is (or at least was meant to be) easy to read / understand. 
FFT 

There are a lot of standard things you do with an FFT depending on the application. Input/output data can be real, complex, magnitudes, interlaced or separate complex, integer, float or doubles or some mixture. Buffers are sometimes reused for output or are separate.

The API I provide for the FFT and inverse are generally:

Complex -> (I)FFT -> Complex, Real -> (I)FFT -> Complex, Real -> (I)FFT -> Magnitude

Here's a snapshot of the Solution.

image

As you can see, its nicely broken into reusable chunks from the SourceSource class where all the sound work is done (I know, imaging that) to the two controls used the visualize it.

Practical use for most of us? I have no clue, but flashy, binky, wavy displays like this are just cool. There's got to me some way we can use this to create something cool, right? Well if not, it's still kind of fun. Plus the code is an interesting read all in of itself...

The Discussion

Add Your 2 Cents