Wait, isn't that MP3, THAT MP3? Fingerprinting your music/audio...


We've recently been hearing a good bit about music fingerprinting. Where some service or other looks at the fingerprints of your music and then does something with that data. If you're like me, you've wondered just how that fingerprinting is done and how I could do something like that too. Yet I'd not seen any technical explanations or code examples. That is until...

Duplicate songs detector via audio fingerprinting

As a software engineer, I was always interested in how a computer can be taught to behave intelligently, at least on some simple tasks that we (homo-sapiens) can easily solve within frames of seconds. One of them applies to audio recognition, which in recent years has been analyzed thoroughly. For this reason, in this article you will be introduced to one of complex tasks which arise in the field of computer science: the efficient comparison and recognition of analog signals in digital format....

In simple terms, if you want to compare audio files by their perceptual equality, you should create the so called "fingerprints" (similar to human fingerprints, which uniquely describe person's identity), and see if sets of these objects, gathered from different audio items, match or not. Logically, similar audio objects should generate similar fingerprints, whereas different files should emanate unlike signatures. One of the requirements for these fingerprints is that they should act as "forgiving hashes", in order to cope with format differences, noise, "loudness", etc. The simplified concept of audio fingerprinting can be visualized below.



General schema (fingerprint creation)

The framework which is going to be built (upon which the entire system will reside), will consist from several different conceptual parts. In next figure, you can visualize the activity diagram, which abstractly describes the logical flow of the fingerprint creation (this flow matches theoretical abstractions described in Content Fingerprinting Using Wavelets, the paper upon which the algorithm is build). Following, I will describe in deeper details each activity involved and what component is responsible for it.


Preprocessing the input signal




There are many steps required in building the fingerprints, so its not uncommon to lose the connections between all of them. In order to simplify the explanation, next you can see a generalized image will help you in visualizing them all together. It is a full example of processing a 44100Hz, Stereo, .mp3 file (Prodigy - No Good). Specifically, the activity flow is as follows:




The application, which will use the described algorithm, is going to detect duplicate files on your local machine. The general task is very simple. First, it will process all the audio files from selected folders, and then will try to detect which of them match to the same signature, thus being duplicates one to each other. It will be built using WPF framework, and specifically the MVVM pattern, which becomes more popular with the expansion of last. The Model-View-ViewModel (MVVM) is an architectural pattern used in software engineering that originated from Microsoft as a specialization of the Presentation Model design pattern introduced by Martin Fowler. MVVM was designed to make use of specific functions in WPF to better facilitate the separation of View layer development from the rest of the pattern by removing virtually all "code behind" from the View layer. Elements of the MVVM pattern include:


And that's only a little bit. There's 18 printed pages to this article... (Yeah, wow)

So let's see it in action.

Note: When you download the Sources.zip, also grab the Binaries.zip.

There's one DLL in the binaries.zip that you'll need, that's not in the Sources.zip (Bass.Net.DLL). So unzip the Sources, grab the Bass.Net.Zip from the binaries.zip and put it in the DuplicateTracks\NativeLibs\ folder and, if need be fix up the Project to point to that DLL. Once you do that the Project compiles and runs just fine (at least on my machine).

When you launch the app you pick a folder (or files) and hit start


It will process the files and give you the results.


So big deal you say, that it's using file size or the file's MD5? Let's look at the properties for those two files...


Given the file sizes are different there's a infinitesimal chance that a standard MD5, or other, hash would be the same. In short, seems like the project is working just as advertised! And we have all the source to it... Smiley


As you can see, there's a good bit of code in the project. Not only relayed to the audio processing, but that the author also "went all the way" and MVVM'ed the solution. Some might call that overkill for something like this, but I still thought it pretty cool how there's no code code-behind. Love that...


Here's a meat and potato's code snip from the Project. If that hurts your brain, then make sure you read the full article as it does a great job in explaining the process, providing additional code snips, etc.

public List<bool[]> CreateFingerprints(IAudio proxy, string filename, IStride stride, int milliseconds, int startmilliseconds)
float[][] spectrum = CreateLogSpectrogram(proxy, filename, milliseconds, startmilliseconds);
int fingerprintLength = FingerprintLength;
int overlap = Overlap;
int logbins = LogBins;
int start = stride.GetFirstStride()/overlap;
List<bool[]> fingerprints = new List<bool[]>();

int width = spectrum.GetLength(0);
while (start + fingerprintLength < width)
float[][] frames = new float[fingerprintLength][];
for (int i = 0; i < fingerprintLength; i++)
frames[i] = new float[logbins];
Array.Copy(spectrum[start + i], frames[i], logbins);
start += fingerprintLength + stride.GetStride()/overlap;
WaveletDecomposition.DecomposeImageInPlace(frames); /*Compute wavelets*/
bool[] image = ExtractTopWavelets(frames);
return fingerprints;

If audio is your thing, audio processing, MP3 file management, or you just think audio fingerprinting sounds cool and you'd like to see how it's done, then this article, and code, is a must read.


Page image, Fingerprints, curtsey of didbygraham

The Discussion

  • User profile image

    Interesting, although it strikes me that a lot of these fingerprinting algorithms are targetted at detecting copyright violations. Trouble is, it is often far too easy to game these algorithms by pitch or tempo shifting, adding silence to start or end, subtle reverb etc.

    From a quick look it seems that the algorithm should be safe against inverting the waveform, changing sample rate, swapping left & right, adding DC offset (probably), and applying notch filters high up in the frequency spectrum. However a HPF with a low cutoff might change the fingerprint enough to defeat it.

    Still its an impressive article and nice to see such a clear explanation.

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.