We've recently been hearing a good bit about music fingerprinting. Where some service or other looks at the fingerprints of your music and then does something with that data. If you're like me, you've wondered just how that fingerprinting is done and how I could do something like that too. Yet I'd not seen any technical explanations or code examples. That is until...
As a software engineer, I was always interested in how a computer can be taught to behave intelligently, at least on some simple tasks that we (homo-sapiens) can easily solve within frames of seconds. One of them applies to audio recognition, which in recent years has been analyzed thoroughly. For this reason, in this article you will be introduced to one of complex tasks which arise in the field of computer science: the efficient comparison and recognition of analog signals in digital format....
In simple terms, if you want to compare audio files by their perceptual equality, you should create the so called "fingerprints" (similar to human fingerprints, which uniquely describe person's identity), and see if sets of these objects, gathered from different audio items, match or not. Logically, similar audio objects should generate similar fingerprints, whereas different files should emanate unlike signatures. One of the requirements for these fingerprints is that they should act as "forgiving hashes", in order to cope with format differences, noise, "loudness", etc. The simplified concept of audio fingerprinting can be visualized below.
The framework which is going to be built (upon which the entire system will reside), will consist from several different conceptual parts. In next figure, you can visualize the activity diagram, which abstractly describes the logical flow of the fingerprint creation (this flow matches theoretical abstractions described in Content Fingerprinting Using Wavelets, the paper upon which the algorithm is build). Following, I will describe in deeper details each activity involved and what component is responsible for it.
There are many steps required in building the fingerprints, so its not uncommon to lose the connections between all of them. In order to simplify the explanation, next you can see a generalized image will help you in visualizing them all together. It is a full example of processing a
44100Hz, Stereo, .mp3file (Prodigy - No Good). Specifically, the activity flow is as follows:
The application, which will use the described algorithm, is going to detect duplicate files on your local machine. The general task is very simple. First, it will process all the audio files from selected folders, and then will try to detect which of them match to the same signature, thus being duplicates one to each other. It will be built using
WPFframework, and specifically the
MVVMpattern, which becomes more popular with the expansion of last. The
Model-View-ViewModel (MVVM)is an architectural pattern used in software engineering that originated from Microsoft as a specialization of the Presentation Model design pattern introduced by Martin Fowler.
MVVMwas designed to make use of specific functions in
WPFto better facilitate the separation of
Viewlayer development from the rest of the pattern by removing virtually all "code behind" from the
Viewlayer. Elements of the
And that's only a little bit. There's 18 printed pages to this article... (Yeah, wow)
So let's see it in action.
Note: When you download the Sources.zip, also grab the Binaries.zip.
There's one DLL in the binaries.zip that you'll need, that's not in the Sources.zip (Bass.Net.DLL). So unzip the Sources, grab the Bass.Net.Zip from the binaries.zip and put it in the DuplicateTracks\NativeLibs\ folder and, if need be fix up the Project to point to that DLL. Once you do that the Project compiles and runs just fine (at least on my machine).
When you launch the app you pick a folder (or files) and hit start
It will process the files and give you the results.
So big deal you say, that it's using file size or the file's MD5? Let's look at the properties for those two files...
Given the file sizes are different there's a infinitesimal chance that a standard MD5, or other, hash would be the same. In short, seems like the project is working just as advertised! And we have all the source to it...
As you can see, there's a good bit of code in the project. Not only relayed to the audio processing, but that the author also "went all the way" and MVVM'ed the solution. Some might call that overkill for something like this, but I still thought it pretty cool how there's no code code-behind. Love that...
Here's a meat and potato's code snip from the Project. If that hurts your brain, then make sure you read the full article as it does a great job in explaining the process, providing additional code snips, etc.
public List<bool> CreateFingerprints(IAudio proxy, string filename, IStride stride, int milliseconds, int startmilliseconds)
float spectrum = CreateLogSpectrogram(proxy, filename, milliseconds, startmilliseconds);
int fingerprintLength = FingerprintLength;
int overlap = Overlap;
int logbins = LogBins;
int start = stride.GetFirstStride()/overlap;
List<bool> fingerprints = new List<bool>();
int width = spectrum.GetLength(0);
while (start + fingerprintLength < width)
float frames = new float[fingerprintLength];
for (int i = 0; i < fingerprintLength; i++)
frames[i] = new float[logbins];
Array.Copy(spectrum[start + i], frames[i], logbins);
start += fingerprintLength + stride.GetStride()/overlap;
WaveletDecomposition.DecomposeImageInPlace(frames); /*Compute wavelets*/
bool image = ExtractTopWavelets(frames);
If audio is your thing, audio processing, MP3 file management, or you just think audio fingerprinting sounds cool and you'd like to see how it's done, then this article, and code, is a must read.