<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" media="screen" href="/styles/xslt/rss.xslt"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:c9="http://channel9.msdn.com">
<channel>
	<title>Channel 9</title>
    <atom:link rel="self" type="application/rss+xml" href="http://channel9.msdn.com/Niners/markheath/Posts/RSS"></atom:link>
    <itunes:summary></itunes:summary>
    <itunes:author>Microsoft</itunes:author>
    <itunes:subtitle></itunes:subtitle>
    <image>
      <url>http://mschnlnine.vo.llnwd.net/d1/Dev/App_Themes/C9/images/feedimage.png</url>
      <title>Channel 9</title>
      <link>http://channel9.msdn.com/Niners/markheath/Posts</link>
    </image>
    <itunes:image href=""></itunes:image>
    <itunes:category text="Technology"></itunes:category>
    <description>Channel 9 keeps you up to date with the latest news and behind the scenes info from Microsoft that developers love to keep up with. From LINQ to SilverLight – Watch videos and hear about all the cool technologies coming and the people behind them.</description>
    <link>http://channel9.msdn.com/Niners/markheath/Posts</link>
    <language>en</language>
    <pubDate>Fri, 24 May 2013 02:48:18 GMT</pubDate>
    <lastBuildDate>Fri, 24 May 2013 02:48:18 GMT</lastBuildDate>
    <generator>Rev9</generator>
    <c9:totalResults>2</c9:totalResults>
    <c9:pageCount>1</c9:pageCount>
    <c9:pageSize>25</c9:pageSize>
  <item>
      <title>Autotune.NET</title>
      <description><![CDATA[<p>We've all cringed as a hopelessly out of tune contestant appears on the latest episode of “American Idol.” Occasionally, there's a contestant who manages to be pitch perfect all the way through—right until they flub the final note. And in the cutthroat world
 of televised auditions, sing one slightly flat note and you're out. </p>
<p>So what takes care of a bad-pitch day? Autotune—an effect that corrects the pitch of your voice so you'll never again sing out of tune. And now, with the power of modern microprocessors, autotune is possible in real-time, allowing singers to benefit from
 its almost magical powers during live concerts.</p>
<p>The company most famous for its autotune effect is Antares. <a href="http://www.antarestech.com/products/auto-tune-evo.shtml">
Antares Auto-Tune</a> currently retails for $249, and a stripped down version is available for $100. In addition to simply improving the pitch of a dodgy singer, autotune can be used to create unique robotic sounding vocal effects, a technique massively popular
 in recent years thanks to its use by artists such as T-Pain and the group behind the “<a href="http://www.youtube.com/show/autotunethenews">Auto-Tune the News</a>” YouTube videos. In 1998, when the effect was first used on
<a href="http://en.wikipedia.org/wiki/Believe_%28Cher_song%29">Cher's “Believe” single</a>, the producer used such extreme settings that instead of subtly adjusting the pitch, autotune “snapped” instantaneously to the nearest “correct” note.
</p>
<p>Here is a nerdy example of what Autotune can do.</p>
<div id="scid:5737277B-5D6D-4f48-ABFC-DD9C333F4C5D:d3d009f2-6f0d-4a64-96f9-8bbf864e4e06" class="wlWriterEditableSmartContent">
<div></div>
</div>
<h2>How does Autotune work?</h2>
<p>An autotune effect has two parts. The first is <b>pitch detection</b>, which calculates the dominant frequency of the incoming signal, and is the reason autotune is normally used on monophonic audio sources (i.e. playing one note at a time, not whole chords).
 So, if your guitar is out of tune, you're out of luck (<a href="http://www.celemony.com/cms/">Celemony's Melodyne</a> product, however, features some incredible capabilities for pitch-shifting polyphonic audio).</p>
<p>The second stage is <b>pitch shifting</b>, or “correcting” a given note. However, the bigger the pitch shift required, the more artificial the end result will be, and it is worth noting that absolutely perfect pitch is not always desirable. Sometimes the
 blended notes resulting from vibrato, for example, are an important part of the performance, and eliminating them would be detrimental.</p>
<h2>Creating a .NET Autotune Algorithm</h2>
<p>For this project, we will be creating an autotune effect for .NET. Serious recording enthusiasts should just go out and buy a decent autotune effect, but to have a little fun, we'll see if we can make an autotune that can give us a poor man's Cher effect
 (or T-Pain effect, if you prefer).</p>
<p>To get started, I searched to see if there were some pre-existing open source autotune implementations, which brought me to
<a href="http://decabear.com/awesomebox.html">awesomebox</a>, a project created by Ravi Parikh and Keegan Poppen while they were students at Stanford University. They kindly gave me permission to make use of their code, which uses an auto-correlator for pitch
 detection and an open source pitch-shifting algorithm from audio DSP expert <a href="http://www.dspdimension.com">
Stephan M. Bernsee</a>.</p>
<h3>Porting C&#43;&#43; to C#</h3>
<p>Although porting from C/C&#43;&#43; to C# is not exactly fun, there is enough similarity in the syntax that it is possible to complete without too many changes. You need to remember, however, that a
<b>long</b> in C is an <b>int</b> in C# (i.e. 32 bits long not 64). </p>
<p>Additionally, the C# compiler is fussier than C/C&#43;&#43; when it comes to casting between floats, doubles, and ints. Putting the “f” suffix on numeric literals sorts out most of these compiler errors.
</p>
<p>Pointers can be a pain. I tend to replace them with integer variables used to index into an array. You can of course use unsafe code, but that limits your options if you plan to port to Silverlight or Windows Phone 7 at a later date, neither of which allow
 unsafe code or interop into unmanaged code. The necessary mathematical functions are available in the System.Math class.</p>
<p>To see an example, compare this pitch shifting <a href="http://downloads.dspdimension.com/smbPitchShift.cpp">
C&#43;&#43; source file</a> with my <a href="http://voicerecorder.codeplex.com/SourceControl/changeset/view/f067d3f5a443#VoiceRecorder.Audio%2fSmbPitchShift.cs">
C# conversion</a> of it.</p>
<h2>Capturing Audio with NAudio</h2>
<p>Interop wrappers for the NAudio Windows WaveIn APIs capture the audio. Here is the code used to start recording:</p>
<p><strong>c#:</strong></p>
<pre class="csharpcode">waveIn = <span class="kwrd">new</span> WaveIn();
waveIn.DeviceNumber = recordingDevice;
waveIn.DataAvailable &#43;= waveIn_DataAvailable;
waveIn.RecordingStopped &#43;= <span class="kwrd">new</span> EventHandler(waveIn_RecordingStopped);
waveIn.WaveFormat = recordingFormat;
waveIn.StartRecording();</pre>
<p><strong>VB.Net:</strong></p>
<pre class="csharpcode">waveIn = <span class="kwrd">New</span> WaveIn
waveIn.DeviceNumber = recordingDevice
<span class="kwrd">AddHandler</span> waveIn.DataAvailable, <span class="kwrd">AddressOf</span> waveIn_DataAvailable
<span class="kwrd">AddHandler</span> waveIn.RecordingStopped, <span class="kwrd">AddressOf</span> waveIn_RecordingStopped
waveIn.WaveFormat = _recordingFormat
waveIn.StartRecording()</pre>
<p>The steps are:</p>
<ol>
<li>Create a new <b>WaveIn</b> device </li><li>[Optional] set up the device number (0 for default recording device) </li><li>Add a handler to the <b>DataAvailable</b> event—this is where we will receive the raw audio data
</li><li>Add a handler for the <b>RecordingStopped</b> event. This allows us to close the temporary WAV file we created
</li><li>Set up the recording format. For this project we are going to record in mono (i.e. one channel), 16 bit, 44.1kHz audio—the default setting for most microphones
</li><li>Call the <b>StartRecording</b> method </li></ol>
<p>Whenever the soundcard reports a new buffer of recorded audio, we receive it in the
<b>DataAvailable</b> event handler:</p>
<p><strong>c#:</strong></p>
<pre class="csharpcode"><span class="kwrd">void</span> waveIn_DataAvailable(<span class="kwrd">object</span> sender, WaveInEventArgs e)
{
    <span class="kwrd">byte</span>[] buffer = e.Buffer;
    <span class="kwrd">int</span> bytesRecorded = e.BytesRecorded;
    WriteToFile(buffer, bytesRecorded);

    <span class="kwrd">for</span> (<span class="kwrd">int</span> index = 0; index &lt; e.BytesRecorded; index &#43;= 2)
    {
        <span class="kwrd">short</span> sample = (<span class="kwrd">short</span>)((buffer[index &#43; 1] &lt;&lt; 8) |
                                buffer[index &#43; 0]);
        <span class="kwrd">float</span> sample32 = sample / 32768f;
        sampleAggregator.Add(sample32);
    }
}</pre>
<p><strong>VB.Net</strong></p>
<pre class="csharpcode"><span class="kwrd">Private</span> <span class="kwrd">Sub</span> waveIn_DataAvailable(<span class="kwrd">ByVal</span> sender <span class="kwrd">As</span> <span class="kwrd">Object</span>, <span class="kwrd">ByVal</span> e <span class="kwrd">As</span> WaveInEventArgs)
    <span class="kwrd">Dim</span> buffer() = e.Buffer
    <span class="kwrd">Dim</span> bytesRecorded = e.BytesRecorded
    WriteToFile(buffer, bytesRecorded)

    <span class="kwrd">For</span> index = 0 <span class="kwrd">To</span> e.BytesRecorded - 1 <span class="kwrd">Step</span> 2
        <span class="kwrd">Dim</span> sample = <span class="kwrd">CShort</span>(buffer(index &#43; 1)) &lt;&lt; 8 <span class="kwrd">Or</span> <span class="kwrd">CShort</span>(buffer(index &#43; 0))
        <span class="kwrd">Dim</span> sample32 = sample / 32768.0F
        _sampleAggregator.Add(sample32)
    <span class="kwrd">Next</span> index
<span class="kwrd">End</span> Sub</pre>
<p>The <b>WaveInEventArgs</b> contains the number of bytes recorded (<b>e.BytesRecorded</b>) and a pointer to the buffer containing those bytes (<b>e.Buffer</b>). The handler does two things with the recorded data. First, it calls WriteToFile, which uses the
<b>WaveFileWriter </b>class from NAudio to write the data to disk:</p>
<p><strong>c#:</strong></p>
<pre class="csharpcode"><span class="rem">// before we start recording, set up a WaveFileWriter...</span>
writer = <span class="kwrd">new</span> WaveFileWriter(waveFileName, recordingFormat);

<span class="rem">// ... every block we receive we write it to the WaveFileWriter:</span>
writer.WriteData(buffer, 0, bytesRecorded);

<span class="rem">// ... and when recording stops we must call Dispose to finalize the</span>
<span class="rem">// .WAV file properly</span>
writer.Dispose()</pre>
<p><strong>VB.Net:</strong></p>
<pre class="csharpcode">writer = <span class="kwrd">New</span> WaveFileWriter(waveFileName, _recordingFormat)
writer.WriteData(buffer, 0, bytesRecorded)
writer.Dispose()</pre>
<h3>Converting Audio to Floating Point and Back Again</h3>
<p>Once recording has completed, we have a WAV file on which to perform our autotune effect. However, our WAV file consists of 16 bit samples (i.e.
<b>System.Int16</b> aka <b>short</b>). In other words, we have a sequence of byte pairs, each of which represent a number in the range -32768 to 32767. For the digital signal processing we will be performing, it's best to have a sequence of floating point numbers
 (<b>System.Single</b> or <b>float</b>) in the range -1.0f to 1.0f. This is a common requirement, so NAudio provides a utility class to convert audio from short to float called
<b>Wave16ToFloatProvider</b>. Here's the code that takes a WAV file and implements the autotune algorithm on it:</p>
<p><strong>c#:</strong> </p>
<pre class="csharpcode"><span class="kwrd">public</span> <span class="kwrd">static</span> <span class="kwrd">void</span> ApplyAutoTune(<span class="kwrd">string</span> fileToProcess, <span class="kwrd">string</span> tempFile, AutoTuneSettings autotuneSettings)
{
    <span class="kwrd">using</span> (WaveFileReader reader = <span class="kwrd">new</span> WaveFileReader(fileToProcess))
    {
        IWaveProvider stream32 = <span class="kwrd">new</span> Wave16toFloatProvider(reader);
        IWaveProvider streamEffect = <span class="kwrd">new</span> AutoTuneWaveProvider(stream32, autotuneSettings);
        IWaveProvider stream16 = <span class="kwrd">new</span> WaveFloatTo16Provider(streamEffect);
        <span class="kwrd">using</span> (WaveFileWriter converted = <span class="kwrd">new</span> WaveFileWriter(tempFile, stream16.WaveFormat))
        {
            <span class="rem">// buffer length needs to be a power of 2 for FFT to work nicely</span>
            <span class="rem">// however, make the buffer too long and pitches aren't detected fast enough</span>
            <span class="rem">// successful buffer sizes: 8192, 4096, 2048, 1024</span>
            <span class="rem">// (some pitch detection algorithms need at least 2048)</span>
            <span class="kwrd">byte</span>[] buffer = <span class="kwrd">new</span> <span class="kwrd">byte</span>[8192]; 
            <span class="kwrd">int</span> bytesRead;
            <span class="kwrd">do</span>
            {
                bytesRead = stream16.Read(buffer, 0, buffer.Length);
                converted.WriteData(buffer, 0, bytesRead);
            } <span class="kwrd">while</span> (bytesRead != 0 &amp;&amp; converted.Length &lt; reader.Length);
        }
    }
}</pre>
<p><strong>VB.Net</strong></p>
<pre class="csharpcode"><span class="kwrd">Public</span> <span class="kwrd">Shared</span> <span class="kwrd">Sub</span> ApplyAutoTune(<span class="kwrd">ByVal</span> fileToProcess <span class="kwrd">As</span> <span class="kwrd">String</span>,
    <span class="kwrd">ByVal</span> tempFile <span class="kwrd">As</span> <span class="kwrd">String</span>,
    <span class="kwrd">ByVal</span> autotuneSettings <span class="kwrd">As</span> AutoTuneSettings)
    Using reader <span class="kwrd">As</span> <span class="kwrd">New</span> WaveFileReader(fileToProcess)
        <span class="kwrd">Dim</span> stream32 <span class="kwrd">As</span> IWaveProvider = <span class="kwrd">New</span> Wave16ToFloatProvider(reader)
        <span class="kwrd">Dim</span> streamEffect <span class="kwrd">As</span> IWaveProvider = <span class="kwrd">New</span> AutoTuneWaveProvider(stream32, autotuneSettings)
        <span class="kwrd">Dim</span> stream16 <span class="kwrd">As</span> IWaveProvider = <span class="kwrd">New</span> WaveFloatTo16Provider(streamEffect)
        Using converted <span class="kwrd">As</span> <span class="kwrd">New</span> WaveFileWriter(tempFile, stream16.WaveFormat)
            <span class="rem">' buffer length needs to be a power of 2 for FFT to work nicely</span>
            <span class="rem">' however, make the buffer too long and pitches aren't detected fast enough</span>
            <span class="rem">' successful buffer sizes: 8192, 4096, 2048, 1024</span>
            <span class="rem">' (some pitch detection algorithms need at least 2048)</span>
            <span class="kwrd">Dim</span> buffer(8191) <span class="kwrd">As</span> <span class="kwrd">Byte</span>
            <span class="kwrd">Dim</span> bytesRead <span class="kwrd">As</span> <span class="kwrd">Integer</span>
            <span class="kwrd">Do</span>
                bytesRead = stream16.Read(buffer, 0, buffer.Length)
                converted.WriteData(buffer, 0, bytesRead)
            <span class="kwrd">Loop</span> <span class="kwrd">While</span> bytesRead &lt;&gt; 0 <span class="kwrd">AndAlso</span> converted.Length &lt; reader.Length
        <span class="kwrd">End</span> Using
    <span class="kwrd">End</span> Using
<span class="kwrd">End</span> Sub</pre>
<p>Here's how it works:</p>
<ol>
<li>First we use a <b>WaveFileReader</b> to open the file that we've just created, which contains 16 bit audio samples
</li><li>Then we use the <b>Wave16ToFloatProvider</b> to perform the conversion to floating point samples
</li><li>Next we pass it through our autotune effect (the <b>AutotuneWaveProvider</b>). We'll explain how this works this later
</li><li>Then we use the <b>WaveFloatTo16Provider</b> to convert back to 16 bit samples ready for saving to WAV (we could save a 32 bit WAV, but it would be rather wasteful of disk space)
</li><li>Having set up the audio pipeline, we can read from the WaveFloatTo16Provider and pull audio right through from the WAV file. We need to read in block sizes that are a power of 2, since we are passing the data through FFTs. If we want to read arbitrary block
 sizes, we need to introduce another element into our pipeline to buffer up enough data to pass through FFTs
</li><li>Finally, we use the <b>WaveFileWriter </b>to write the data we have read into a WAV file
</li></ol>
<h2>The AutoTuneWaveProvider</h2>
<p>As we saw in the last code snippet, the <b>AutoTuneWaveProvider</b> is the piece in our audio pipeline that actually performs the autotune effect. It implements the NAudio
<b>IWaveProvider</b> interface, which allows it to be used in the pipeline for real-time playback if necessary, even though our example code is not doing this (see the section on performance later). Here's the AutoTuneWaveProvider constructor:</p>
<p><strong>c#:</strong> </p>
<pre class="csharpcode"><span class="kwrd">public</span> AutoTuneWaveProvider(IWaveProvider source, AutoTuneSettings autoTuneSettings)
{
    <span class="kwrd">this</span>.autoTuneSettings = autoTuneSettings;
    <span class="kwrd">if</span> (source.WaveFormat.SampleRate != 44100)
        <span class="kwrd">throw</span> <span class="kwrd">new</span> ArgumentException(<span class="str">&quot;AutoTune only works at 44.1kHz&quot;</span>);
    <span class="kwrd">if</span> (source.WaveFormat.Encoding != WaveFormatEncoding.IeeeFloat)
        <span class="kwrd">throw</span> <span class="kwrd">new</span> ArgumentException(<span class="str">&quot;AutoTune only works on IEEE floating point audio data&quot;</span>);
    <span class="kwrd">if</span> (source.WaveFormat.Channels != 1)
        <span class="kwrd">throw</span> <span class="kwrd">new</span> ArgumentException(<span class="str">&quot;AutoTune only works on mono input sources&quot;</span>);

    <span class="kwrd">this</span>.source = source;
    <span class="kwrd">this</span>.pitchDetector = <span class="kwrd">new</span> AutoCorrelator(source.WaveFormat.SampleRate);
    <span class="kwrd">this</span>.pitchShifter = <span class="kwrd">new</span> SmbPitchShifter(Settings);
    <span class="kwrd">this</span>.waveBuffer = <span class="kwrd">new</span> WaveBuffer(8192);
}</pre>
<p><strong>VB.Net</strong></p>
<pre class="csharpcode">Public Sub New(ByVal source As IWaveProvider, ByVal autoTuneSettings As AutoTuneSettings)
    Me.autoTuneSettings = autoTuneSettings
    If source.WaveFormat.SampleRate &lt;&gt; 44100 Then
        Throw New ArgumentException(<span class="str">&quot;AutoTune only works at 44.1kHz&quot;</span>)
    End If
    If source.WaveFormat.Encoding &lt;&gt; WaveFormatEncoding.IeeeFloat Then
        Throw New ArgumentException(<span class="str">&quot;AutoTune only works on IEEE floating point audio data&quot;</span>)
    End If
    If source.WaveFormat.Channels &lt;&gt; 1 Then
        Throw New ArgumentException(<span class="str">&quot;AutoTune only works on mono input sources&quot;</span>)
    End If

    Me.source = source
    Me.pitchDetector = New AutoCorrelator(source.WaveFormat.SampleRate)
    <span class="str">' alternative pitch detector:
    '</span> Me.pitchDetector = New FftPitchDetector(source.WaveFormat.SampleRate)
    Me.pitchShifter = New SmbPitchShifter(Settings, source.WaveFormat.SampleRate)
    Me.waveBuffer = New WaveBuffer(8192)
End Sub</pre>
<p>Some points to notice:</p>
<ol>
<li>We pass in a source <b>IWaveProvider</b>—this is where the data will be coming from
</li><li>We check that the source is in the right format—floating point mono input. </li><li>We also pass in an <b>AutoTuneSettings</b> object. This not only encapsulates the settings for autotune, it is important if you want to adjust the settings in real-time while the effect is running
</li><li>We then create the two key components of our autotune effect: a pitch detector (which uses an autocorrelator), and a pitch shifter
</li><li>Finally we create a buffer to use for audio processing. This can be a byte[] array, but we use
<b>WaveBuffer </b>from NAudio because it uses <a href="http://mark-dot-net.blogspot.com/2008/06/wavebuffer-casting-byte-arrays-to-float.html">
a clever trick</a> that allows us to cast a byte[] into a float[] without using unsafe code or having to copy all of the data
</li></ol>
<p>The key method on any implementation of <b>IWaveProvider</b> is its <b>Read</b> method. This is where the audio consumer, usually the sound card or a WaveFileWriter, asks for data. The data must be supplied as a byte array, and if at all possible you should
 return exactly the number of bytes you were asked for (if you can't, an extra layer of buffering is usually required, or audio playback will be choppy). Here's our implementation of the Read method:</p>
<p><strong>c#:</strong></p>
<pre class="csharpcode"><span class="kwrd">public</span> <span class="kwrd">int</span> Read(<span class="kwrd">byte</span>[] buffer, <span class="kwrd">int</span> offset, <span class="kwrd">int</span> count)
{
    <span class="kwrd">if</span> (waveBuffer == <span class="kwrd">null</span> || waveBuffer.MaxSize &lt; count)
    {
        waveBuffer = <span class="kwrd">new</span> WaveBuffer(count);
    }

    <span class="kwrd">int</span> bytesRead = source.Read(waveBuffer, 0, count);

    <span class="rem">// the last bit sometimes needs to be rounded up:</span>
    <span class="kwrd">if</span> (bytesRead &gt; 0) bytesRead = count;

    <span class="kwrd">int</span> frames = bytesRead / <span class="kwrd">sizeof</span>(<span class="kwrd">float</span>); 
    <span class="kwrd">float</span> pitch = pitchDetector.DetectPitch(waveBuffer.FloatBuffer, frames);
        
    <span class="rem">// an attempt to make it less &quot;warbly&quot; by holding onto the pitch </span>
    <span class="rem">// for at least one more buffer</span>
    <span class="kwrd">if</span> (pitch == 0f &amp;&amp; release &lt; maxHold)
    {
        pitch = previousPitch;
        release&#43;&#43;;
    }
    <span class="kwrd">else</span>
    {
        <span class="kwrd">this</span>.previousPitch = pitch;
        release = 0;
    }
    
    WaveBuffer outBuffer = <span class="kwrd">new</span> WaveBuffer(buffer);

    pitchShifter.ShiftPitch(waveBuffer.FloatBuffer, pitch, 0.0f, outBuffer.FloatBuffer, frames);

    <span class="kwrd">return</span> frames * 4;
}</pre>
<p><strong>VB.Net:</strong></p>
<pre class="csharpcode">Public Function Read(ByVal buffer() As Byte, ByVal offset As Integer,
                        ByVal count As Integer) As Integer Implements NAudio.Wave.IWaveProvider.Read
    If waveBuffer Is Nothing OrElse waveBuffer.MaxSize &lt; count Then
        waveBuffer = New WaveBuffer(count)
    End If

    Dim bytesRead = source.Read(waveBuffer, 0, count)
    <span class="str">'Debug.Assert(bytesRead = count)

    '</span> the last bit sometimes needs to be rounded up:
    If bytesRead &gt; 0 Then
        bytesRead = count
    End If

    <span class="str">'pitchsource-&gt;getPitches()
    Dim frames = bytesRead \ Len(New Single) '</span> MRH: was count
    Dim pitch = pitchDetector.DetectPitch(waveBuffer.FloatBuffer, frames)

    ' MRH: an attempt to make it less <span class="str">&quot;warbly&quot;</span> by holding onto the pitch <span class="kwrd">for</span> at least one more buffer
    If pitch = 0.0F AndAlso release &lt; maxHold Then
        pitch = previousPitch
        release &#43;= 1
    Else
        Me.previousPitch = pitch
        release = 0
    End If

    Dim midiNoteNumber = 40
    Dim targetPitch = CSng(8.175 * Math.Pow(1.05946309, midiNoteNumber))

    Dim outBuffer As New WaveBuffer(buffer)

    pitchShifter.ShiftPitch(waveBuffer.FloatBuffer, pitch, targetPitch, outBuffer.FloatBuffer, frames)

    Return frames * 4
End Function</pre>
<p>Here's what's going on</p>
<ol>
<li>First we need to read from our source (in our case, a WAV file converted to floating point samples)
</li><li>If we get less data than we were expecting, we know that means we're at the end of the file, so we'll just pretend we got a full buffer
</li><li>We then work out how many audio ‘frames' are present (which is the same as the number of samples since this is mono audio). It's floating point audio, so frames equal bytes divided by four
</li><li>We then pass the data through our pitch detector algorithm (see below) </li><li>Next we use some experimental code to stabilize pitch detection by reporting the previous frequency when no pitch is picked up
</li><li>Finally we pass the data into our pitch shifter, including details of the detected pitch
</li></ol>
<p>Now that we've seen the big picture of the <b>AutotuneWaveProvider</b>, let's drill down into its two main components—the pitch detector and pitch shifter.</p>
<h2>Pitch Detection with Autocorrelation</h2>
<p>The pitch detection part of autotune is vital to getting good results. If it can't accurately detect the input pitch, it will incorrectly calculate how much the pitch needs to be adjusted. However, high quality pitch detection is quite difficult to get right.
 First of all, the microphone may well pick up background noise. Second, when you sing a into a microphone, the signal consists not only of a single frequency, but also “harmonics” at different frequencies.</p>
<p>The good news is that we need to detect only the primary pitch. </p>
<p>The awesomebox algorithm makes use of “<a href="http://en.wikipedia.org/wiki/Autocorrelation">autocorrelation</a>” for its pitch detection, but I made a few small tweaks to how the algorithm is implemented in an attempt to improve its accuracy. Autocorrelation
 has the advantage of being a relatively quick process. The basic principle is that if a signal is periodic, it will “correlate” well with itself when shifted forward (or backwards) one cycle.
</p>
<p>Let's say we are looking to see if the note “Middle C” is being sung. The frequency of Middle C is around 262Hz. If we are sampling at 44.1kHz (which is standard for CD quality audio), then we will expect the signal to repeat at approximately every 168 samples
 (44100/262). Accordingly, for every sample in the buffer, we calculate the sum of squares of that sample and the sample 168 samples previous. We do this for every possible offset that measures a frequency in the range we want to detect (I am using 85Hz to
 300Hz, which is adequate for pitch detecting vocals). The offset with the highest score is the most likely frequency.</p>
<p>Let's have a look at the code for an autocorrelation algorithm, starting with the constructor for the
<b>AutoCorrelator</b> class:</p>
<p><strong>c#:</strong> </p>
<pre class="csharpcode"><span class="kwrd">public</span> AutoCorrelator(<span class="kwrd">int</span> sampleRate)
{
    <span class="kwrd">this</span>.sampleRate = (<span class="kwrd">float</span>)sampleRate;
    <span class="kwrd">int</span> minFreq = 85;
    <span class="kwrd">int</span> maxFreq = 255;

    <span class="kwrd">this</span>.maxOffset = sampleRate / minFreq;
    <span class="kwrd">this</span>.minOffset = sampleRate / maxFreq;
}</pre>
<p><strong>VB.Net</strong></p>
<pre class="csharpcode"><span class="kwrd">Public</span> <span class="kwrd">Sub</span> <span class="kwrd">New</span>(<span class="kwrd">ByVal</span> sampleRate <span class="kwrd">As</span> <span class="kwrd">Integer</span>)
    <span class="kwrd">Me</span>.sampleRate = <span class="kwrd">CSng</span>(sampleRate)
    <span class="kwrd">Dim</span> minFreq = 85
    <span class="kwrd">Dim</span> maxFreq = 255

    <span class="kwrd">Me</span>.maxOffset = sampleRate \ minFreq
    <span class="kwrd">Me</span>.minOffset = sampleRate \ maxFreq
<span class="kwrd">End</span> Sub</pre>
<p>First of all, we pre-calculate some values based on the minimum and maximum frequencies we are looking for. Remember that lower frequencies are harder to detect than higher frequencies, so don't set minFreq too low. MaxOffset and MinOffset are the maximum
 and minimum backwards distances we will be seeking while looking for a match.</p>
<p><strong>c#:</strong> </p>
<pre class="csharpcode"><span class="kwrd">public</span> <span class="kwrd">float</span> DetectPitch(<span class="kwrd">float</span>[] buffer, <span class="kwrd">int</span> frames)
{
    <span class="kwrd">if</span> (prevBuffer == <span class="kwrd">null</span>)
    {
        prevBuffer = <span class="kwrd">new</span> <span class="kwrd">float</span>[frames];
    }

    <span class="kwrd">float</span> maxCorr = 0;
    <span class="kwrd">int</span> maxLag = 0;

    <span class="rem">// starting with low frequencies, working to higher</span>
    <span class="kwrd">for</span> (<span class="kwrd">int</span> lag = maxOffset; lag &gt;= minOffset; lag--)
    {
        <span class="kwrd">float</span> corr = 0; <span class="rem">//  sum of squares</span>
        <span class="kwrd">for</span> (<span class="kwrd">int</span> i = 0; i &lt; frames; i&#43;&#43;)
        {
            <span class="kwrd">int</span> oldIndex = i - lag;
            <span class="kwrd">float</span> sample = ((oldIndex &lt; 0) ? prevBuffer[frames &#43; 
            corr &#43;= (sample * buffer[i]);
        }
        <span class="kwrd">if</span> (corr &gt; maxCorr)
        {
            maxCorr = corr;
            maxLag = lag;
        }

    }
    <span class="kwrd">for</span> (<span class="kwrd">int</span> n = 0; n &lt; frames; n&#43;&#43;)
    { 
        prevBuffer[n] = buffer[n]; 
    }
    <span class="kwrd">float</span> noiseThreshold = frames / 1000f;

    <span class="kwrd">if</span> (maxCorr &lt; noiseThreshold || maxLag == 0) <span class="kwrd">return</span> 0.0f;
    <span class="kwrd">return</span> <span class="kwrd">this</span>.sampleRate / maxLag;
}</pre>
<p><strong>VB.Net</strong></p>
<pre class="csharpcode"><span class="kwrd">Public</span> <span class="kwrd">Function</span> DetectPitch(<span class="kwrd">ByVal</span> buffer() <span class="kwrd">As</span> <span class="kwrd">Single</span>, <span class="kwrd">ByVal</span> frames <span class="kwrd">As</span> <span class="kwrd">Integer</span>) <span class="kwrd">As</span> <span class="kwrd">Single</span> <span class="kwrd">Implements</span> IPitchDetector.DetectPitch
    <span class="kwrd">If</span> prevBuffer <span class="kwrd">Is</span> <span class="kwrd">Nothing</span> <span class="kwrd">Then</span>
        prevBuffer = <span class="kwrd">New</span> <span class="kwrd">Single</span>(frames - 1){}
    <span class="kwrd">End</span> <span class="kwrd">If</span>
    <span class="kwrd">Dim</span> secCor <span class="kwrd">As</span> <span class="kwrd">Single</span> = 0
    <span class="kwrd">Dim</span> secLag = 0

    <span class="kwrd">Dim</span> maxCorr <span class="kwrd">As</span> <span class="kwrd">Single</span> = 0
    <span class="kwrd">Dim</span> maxLag = 0

    <span class="rem">' starting with low frequencies, working to higher</span>
    <span class="kwrd">For</span> lag = maxOffset <span class="kwrd">To</span> minOffset <span class="kwrd">Step</span> -1
        <span class="kwrd">Dim</span> corr <span class="kwrd">As</span> <span class="kwrd">Single</span> = 0 <span class="rem">' this is calculated as the sum of squares</span>
        <span class="kwrd">For</span> i = 0 <span class="kwrd">To</span> frames - 1
            <span class="kwrd">Dim</span> oldIndex = i - lag
            <span class="kwrd">Dim</span> sample = (<span class="kwrd">If</span>(oldIndex &lt; 0, prevBuffer(frames &#43; oldIndex), buffer(oldIndex)))
            corr &#43;= (sample * buffer(i))
        <span class="kwrd">Next</span> i
        <span class="kwrd">If</span> corr &gt; maxCorr <span class="kwrd">Then</span>
            maxCorr = corr
            maxLag = lag
        <span class="kwrd">End</span> <span class="kwrd">If</span>
        <span class="kwrd">If</span> corr &gt;= 0.9 * maxCorr <span class="kwrd">Then</span>
            secCor = corr
            secLag = lag
        <span class="kwrd">End</span> <span class="kwrd">If</span>
    <span class="kwrd">Next</span> lag
    <span class="kwrd">For</span> n = 0 <span class="kwrd">To</span> frames - 1
        prevBuffer(n) = buffer(n)
    <span class="kwrd">Next</span> n
    <span class="kwrd">Dim</span> noiseThreshold = frames / 1000.0F
    <span class="rem">'Debug.WriteLine(String.Format(&quot;Max Corr: {0} ({1}), Sec Corr: {2} ({3})&quot;, Me.sampleRate / maxLag, maxCorr, Me.sampleRate / secLag, secCor))</span>
    <span class="kwrd">If</span> maxCorr &lt; noiseThreshold <span class="kwrd">OrElse</span> maxLag = 0 <span class="kwrd">Then</span>
        <span class="kwrd">Return</span> 0.0F
    <span class="kwrd">End</span> <span class="kwrd">If</span>
    <span class="rem">'Return 44100.0f / secLag '--works better for singing</span>
    <span class="kwrd">Return</span> <span class="kwrd">Me</span>.sampleRate / maxLag
<span class="kwrd">End</span> Function</pre>
<p>A few things to notice:</p>
<ol>
<li>Notice that the audio comes in as an array of floating point numbers. NAudio performs this conversion from 16 bit audio for us by using
<b>Wave16ToFloatProvider</b> </li><li>We store the previous buffer. This allows us to look backwards for correlation
</li><li>We then work through each and every possible integer offset within our range, and calculate a correlation value
</li><li>The correlation is calculated as the sum of squares </li><li>If it is the largest so far, we store the “lag” value (i.e. number of samples back that we correlated with)
</li><li>Notice that we return 0 (i.e. no frequency detected) if we don't find a strong frequency. This noise threshold may need to be tweaked depending on your input
</li><li>Finally, we convert into a frequency with the formula <b>sampleRate / maxLag</b>
</li></ol>
<p>I wrote some unit tests to measure the accuracy of detection with sine waves (which admittedly are the easiest to detect). Here are the results for audio sampled at 44.1kHz:</p>
<p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td valign="top" width="140">
<p><b>Test Frequency</b></p>
</td>
<td valign="top" width="132">
<p><b>Detected Pitch</b></p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>109.99Hz</p>
</td>
<td valign="top" width="132">
<p>108.35Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>116.53Hz</p>
</td>
<td valign="top" width="132">
<p>118.23Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>123.46Hz</p>
</td>
<td valign="top" width="132">
<p>123.18Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>130.80Hz</p>
</td>
<td valign="top" width="132">
<p>129.71Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>138.58Hz</p>
</td>
<td valign="top" width="132">
<p>140.00Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>146.82Hz</p>
</td>
<td valign="top" width="132">
<p>148.48Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>155.55Hz</p>
</td>
<td valign="top" width="132">
<p>154.74Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>164.80Hz</p>
</td>
<td valign="top" width="132">
<p>163.33Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>174.60Hz</p>
</td>
<td valign="top" width="132">
<p>172.94Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>184.98Hz</p>
</td>
<td valign="top" width="132">
<p>183.75Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>195.98Hz</p>
</td>
<td valign="top" width="132">
<p>194.27Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>207.63Hz</p>
</td>
<td valign="top" width="132">
<p>206.07Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>219.98Hz</p>
</td>
<td valign="top" width="132">
<p>219.40Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>233.06Hz</p>
</td>
<td valign="top" width="132">
<p>234.57Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>246.92Hz</p>
</td>
<td valign="top" width="132">
<p>247.75Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>261.60Hz</p>
</td>
<td valign="top" width="132">
<p>256.40Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>277.16Hz</p>
</td>
<td valign="top" width="132">
<p>139.56Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>293.64Hz</p>
</td>
<td valign="top" width="132">
<p>146.03Hz</p>
</td>
</tr>
</tbody>
</table>
</p>
<p>Notice that the detected frequencies from the final two tests are actually half the correct amount. This doesn't actually matter for our purposes, since this just means the frequency has been detected as one octave below the correct note.</p>
<p>To improve on the accuracy of the autocorrelator's results, there are a couple of things you can do:</p>
<ul>
<li>Run a band-pass filter before-hand, to remove any frequencies outside of the desired range
</li><li>Combine the results with those obtained from a different technique, such as counting zero-crossings
</li></ul>
<h2>Pitch Detection with the Fast Fourier Transform</h2>
<p>I decided to implement an alternative pitch detection algorithm to see if I could get better results. A different approach is to use the
<a href="http://en.wikipedia.org/wiki/Fast_Fourier_transform">Fast Fourier Transform</a>, which converts signals from the “time domain” into the “frequency domain.”
</p>
<p>The basic approach is to take a block of samples (which must be a power of 2 – e.g. 1024), and run the FFT on them. The FFT takes complex numbers as inputs, which for audio signals are entirely real. The implementation I am using expects real and complex
 parts interleaved for the input buffer. Here's our code setting up <b>fftBuffer </b>
with interleaved samples:</p>
<p><strong>c#:</strong> </p>
<pre class="csharpcode"><span class="kwrd">private</span> <span class="kwrd">float</span>[] fftBuffer;
<span class="kwrd">private</span> <span class="kwrd">float</span>[] prevBuffer;

<span class="kwrd">public</span> <span class="kwrd">float</span> DetectPitch(<span class="kwrd">float</span>[] buffer, <span class="kwrd">int</span> inFrames)
{
    Func&lt;<span class="kwrd">int</span>, <span class="kwrd">int</span>, <span class="kwrd">float</span>&gt; window = HammingWindow;
    <span class="kwrd">if</span> (prevBuffer == <span class="kwrd">null</span>)
    {
        prevBuffer = <span class="kwrd">new</span> <span class="kwrd">float</span>[inFrames];
    }
 
    <span class="rem">// double frames since we are combining present and previous buffers</span>
    <span class="kwrd">int</span> frames = inFrames * 2;
    <span class="kwrd">if</span> (fftBuffer == <span class="kwrd">null</span>)
    {
        fftBuffer = <span class="kwrd">new</span> <span class="kwrd">float</span>[frames * 2]; <span class="rem">// times 2 because it is complex input</span>
    }
 
    <span class="kwrd">for</span> (<span class="kwrd">int</span> n = 0; n &lt; frames; n&#43;&#43;)
    {
        <span class="kwrd">if</span> (n &lt; inFrames)
        {
            fftBuffer[n * 2] = prevBuffer[n] * window(n, frames);
            fftBuffer[n * 2 &#43; 1] = 0; <span class="rem">// need to clear out as fft modifies buffer</span>
        }
        <span class="kwrd">else</span>
        {
            fftBuffer[n * 2] = buffer[n-inFrames] * window(n, frames);
            fftBuffer[n * 2 &#43; 1] = 0; <span class="rem">// need to clear out as fft modifies buffer</span>
        }
    }</pre>
<p><strong>VB.Net</strong></p>
<pre class="csharpcode"><span class="kwrd">Private</span> fftBuffer() <span class="kwrd">As</span> <span class="kwrd">Single</span>
<span class="kwrd">Private</span> prevBuffer() <span class="kwrd">As</span> <span class="kwrd">Single</span>

<span class="kwrd">Public</span> <span class="kwrd">Function</span> DetectPitch(<span class="kwrd">ByVal</span> buffer() <span class="kwrd">As</span> <span class="kwrd">Single</span>,
                            <span class="kwrd">ByVal</span> inFrames <span class="kwrd">As</span> <span class="kwrd">Integer</span>) <span class="kwrd">As</span> <span class="kwrd">Single</span> <span class="kwrd">Implements</span> IPitchDetector.DetectPitch
    <span class="kwrd">Dim</span> window <span class="kwrd">As</span> Func(Of <span class="kwrd">Integer</span>, <span class="kwrd">Integer</span>, <span class="kwrd">Single</span>) = <span class="kwrd">AddressOf</span> HammingWindow
    <span class="kwrd">If</span> prevBuffer <span class="kwrd">Is</span> <span class="kwrd">Nothing</span> <span class="kwrd">Then</span>
        prevBuffer = <span class="kwrd">New</span> <span class="kwrd">Single</span>(inFrames - 1) {}
    <span class="kwrd">End</span> <span class="kwrd">If</span>

    <span class="rem">' double frames since we are combining present and previous buffers</span>
    <span class="kwrd">Dim</span> frames = inFrames * 2
    <span class="kwrd">If</span> fftBuffer <span class="kwrd">Is</span> <span class="kwrd">Nothing</span> <span class="kwrd">Then</span>
        fftBuffer = <span class="kwrd">New</span> <span class="kwrd">Single</span>(frames * 2 - 1) {} <span class="rem">' times 2 because it is complex input</span>
    <span class="kwrd">End</span> <span class="kwrd">If</span>

    <span class="kwrd">For</span> n = 0 <span class="kwrd">To</span> frames - 1
        <span class="kwrd">If</span> n &lt; inFrames <span class="kwrd">Then</span>
            fftBuffer(n * 2) = prevBuffer(n) * window(n, frames)
            fftBuffer(n * 2 &#43; 1) = 0 <span class="rem">' need to clear out as fft modifies buffer</span>
        <span class="kwrd">Else</span>
            fftBuffer(n * 2) = buffer(n - inFrames) * window(n, frames)
            fftBuffer(n * 2 &#43; 1) = 0 <span class="rem">' need to clear out as fft modifies buffer</span>
        <span class="kwrd">End</span> <span class="kwrd">If</span>
    <span class="kwrd">Next</span> n</pre>
<p>Notice that we prepend the previous buffer we were passed. This is a common way of increasing the accuracy and resolution of an FFT by using overlapping windows, and can be further extended to store three previous buffers, allowing us to have 75% overlapping
 windows instead of just the 50% that we have in this example. </p>
<p>For better peak frequency detection, the signal that is passed into the FFT is best pre-processed with a
<a href="http://en.wikipedia.org/wiki/Window_function">“windowing” function</a>. There are several to choose from, each with its own strengths and weaknesses. I used the Hamming window, which is a fairly common choice:</p>
<p><strong>c#:</strong> </p>
<pre class="csharpcode"><span class="kwrd">private</span> <span class="kwrd">float</span> HammingWindow(<span class="kwrd">int</span> n, <span class="kwrd">int</span> N) 
{
    <span class="kwrd">return</span> 0.54f - 0.46f * (<span class="kwrd">float</span>)Math.Cos((2 * Math.PI * n) / (N - 1));
}</pre>
<p><strong>VB.Net</strong></p>
<pre class="csharpcode"><span class="kwrd">Private</span> <span class="kwrd">Function</span> HammingWindow(<span class="kwrd">ByVal</span> n <span class="kwrd">As</span> <span class="kwrd">Integer</span>, <span class="kwrd">ByVal</span> _N <span class="kwrd">As</span> <span class="kwrd">Integer</span>) <span class="kwrd">As</span> <span class="kwrd">Single</span>
    <span class="kwrd">Return</span> 0.54F - 0.46F * <span class="kwrd">CSng</span>(Math.Cos((2 * Math.PI * n) / (_N - 1)))
<span class="kwrd">End</span> Function</pre>
<p>The next step is to pass on our interleaved buffer to the FFT algorithm. I am using Stephan Bernsee's here, though there is an alternative implementation in NAudio that I could have used. Since the same function can be used for an inverse FFT, the -1 parameter
 means (rather counter-intuitively), do a forwards FFT. It processes the data in place, which is fine since we don't need to keep the contents of the input buffer:</p>
<p><strong>c#:</strong> </p>
<pre class="csharpcode"><span class="rem">// assuming frames is a power of 2</span>
SmbPitchShift.smbFft(fftBuffer, frames, -1);</pre>
<p><strong>VB.Net</strong></p>
<pre class="csharpcode"><span class="rem">' assuming frames is a power of 2</span>
SmbPitchShift.smbFft(fftBuffer, frames, -1)</pre>
<p>Once we have completed the FFT, we are ready to interpret its output. The output of the FFT consists of complex numbers (again real followed by imaginary in our buffer), which represent frequency “bins.”</p>
<p>We start off by calculating the bin size and working out which bins correspond to the range of frequencies we are interested in detecting:</p>
<p><strong>c#:</strong> </p>
<pre class="csharpcode"><span class="kwrd">float</span> binSize = sampleRate / frames;
<span class="kwrd">int</span> minBin = (<span class="kwrd">int</span>)(85 / binSize);
<span class="kwrd">int</span> maxBin = (<span class="kwrd">int</span>)(300 / binSize);</pre>
<p><strong>VB.Net</strong></p>
<pre class="csharpcode"><span class="kwrd">Dim</span> binSize = sampleRate / frames
<span class="kwrd">Dim</span> minBin = <span class="kwrd">CInt</span>(Fix(85 / binSize))
<span class="kwrd">Dim</span> maxBin = <span class="kwrd">CInt</span>(Fix(300 / binSize))</pre>
<p>For example, if our sample rate is 44.1kHz and we analyse a block of 1024 samples, then each bin represents 43Hz, which is hardly the granularity we are looking for. To increase resolution, our options are to either sample at a higher rate or analyse a bigger
 chunk. Our approach is to use overlapping blocks of 8192 samples, as we read 4096 samples each time. This means we have a resolution of around 5Hz, which is much more acceptable.</p>
<p>Now we can calculate the magnitude or “intensity” for each frequency by calculating the sum of squares (strictly we should then take the square root, but we don't need to since we are just looking for the largest value):</p>
<p><strong>c#:</strong> </p>
<pre class="csharpcode"><span class="kwrd">float</span> maxIntensity = 0f;
<span class="kwrd">int</span> maxBinIndex = 0;

<span class="kwrd">for</span> (<span class="kwrd">int</span> bin = minBin; bin &lt;= maxBin; bin&#43;&#43;)
{
    <span class="kwrd">float</span> real = fftBuffer[bin * 2];
    <span class="kwrd">float</span> imaginary = fftBuffer[bin * 2 &#43; 1];
    <span class="kwrd">float</span> intensity = real * real &#43; imaginary * imaginary;
    <span class="kwrd">if</span> (intensity &gt; maxIntensity)
    {
        maxIntensity = intensity;
        maxBinIndex = bin;
    }
}</pre>
<p><strong>VB.Net</strong></p>
<pre class="csharpcode"><span class="kwrd">Dim</span> maxIntensity = 0.0F
<span class="kwrd">Dim</span> maxBinIndex = 0
<span class="kwrd">For</span> bin = minBin <span class="kwrd">To</span> maxBin
    <span class="kwrd">Dim</span> real = fftBuffer(bin * 2)
    <span class="kwrd">Dim</span> imaginary = fftBuffer(bin * 2 &#43; 1)
    <span class="kwrd">Dim</span> intensity = real * real &#43; imaginary * imaginary
    <span class="kwrd">If</span> intensity &gt; maxIntensity <span class="kwrd">Then</span>
        maxIntensity = intensity
        maxBinIndex = bin
    <span class="kwrd">End</span> <span class="kwrd">If</span>
<span class="kwrd">Next</span> bin</pre>
<p>Since we have identified the bin with the maximum intensity, we can calculate the detected frequency:</p>
<p><strong>c#:</strong> </p>
<pre class="csharpcode"><span class="kwrd">return</span> binSize * maxBinIndex;</pre>
<p><strong>VB.Net</strong></p>
<pre class="csharpcode"><span class="kwrd">Return</span> binSize * maxBinIndex</pre>
<p>I don't currently specify a minimum threshold for maxIntensity, but perhaps if it were very low, the FFT pitch detector would return zero to indicate no pitch detected instead of returning an answer that is probably not accurate.</p>
<p>Let's have a look at how the FFT pitch detector does:
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td valign="top" width="140">
<p><b>Test Frequency</b></p>
</td>
<td valign="top" width="132">
<p><b>Detected Pitch</b></p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>109.99Hz</p>
</td>
<td valign="top" width="132">
<p>107.67Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>116.53Hz</p>
</td>
<td valign="top" width="132">
<p>118.43Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>123.46Hz</p>
</td>
<td valign="top" width="132">
<p>123.82Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>130.80Hz</p>
</td>
<td valign="top" width="132">
<p>129.20Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>138.58Hz</p>
</td>
<td valign="top" width="132">
<p>139.97Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>146.82Hz</p>
</td>
<td valign="top" width="132">
<p>145.35Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>155.55Hz</p>
</td>
<td valign="top" width="132">
<p>156.12Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>164.80Hz</p>
</td>
<td valign="top" width="132">
<p>166.88Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>174.60Hz</p>
</td>
<td valign="top" width="132">
<p>172.27Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>184.98Hz</p>
</td>
<td valign="top" width="132">
<p>183.03Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>195.98Hz</p>
</td>
<td valign="top" width="132">
<p>193.80Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>207.63Hz</p>
</td>
<td valign="top" width="132">
<p>209.95Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>219.98Hz</p>
</td>
<td valign="top" width="132">
<p>220.72Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>233.06Hz</p>
</td>
<td valign="top" width="132">
<p>231.48Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>246.92Hz</p>
</td>
<td valign="top" width="132">
<p>247.63Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>261.60Hz</p>
</td>
<td valign="top" width="132">
<p>263.78Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>277.16Hz</p>
</td>
<td valign="top" width="132">
<p>274.55Hz</p>
</td>
</tr>
<tr>
<td valign="top" width="140">
<p>293.64Hz</p>
</td>
<td valign="top" width="132">
<p>296.08Hz</p>
</td>
</tr>
</tbody>
</table>
</p>
<p>As can be seen, it correctly picks out the primary frequencies of the higher notes, but overall it doesn't get that much closer than the autocorrelator, so I've left that as the default algorithm. You, however, can swap in the FFT detector in the code if
 it works better for the material you are auto-tuning.</p>
<p>There are ways of using the phase information from the FFT output to increase the accuracy of pitch detection even further, but I have left that as an exercise for the reader!</p>
<h4>Pitch Shifting</h4>
<p>The next step is to determine how much we will shift the pitch. The simplest way to do this is to look for the musical pitch that is closest to the detected pitch. Then, the amount of the shift by is simply the ratio of those two notes.
</p>
<p>There are, however, some additional considerations. First, we may want to select a subset of musical notes that are acceptable. For example, only notes in the key of C#, or maybe F# minor pentatonic. This may require a slightly more radical adjustment.</p>
<p>Second, depending on the effect we are after, we may not want to instantaneously jump to the new frequency. The code I am using utilizes a fairly rudimentary “attack” time parameter, allowing you to gradually move to the new frequency.
</p>
<p>The actual DSP for the pitch-shifting effect is more or less untouched from Stephan Bernsee's code, and this is because it works really well. The Bernsee's code makes use of the Fast Fourier Transform, plus a bunch of clever mathematics, which I
<i>almost </i>understand, but not quite well enough to try and explain here! You're better off reading an article in which the man himself explains
<a href="http://www.dspdimension.com/admin/pitch-shifting-using-the-ft/">how it works</a>.</p>
<p>The class that manages the pitch-shifting algorithm is called <b>SmbPitchShifer
</b>and inherits from a <b>PitchShifter</b> base class. This does the bulk of its work in the
<b>ShiftPitch</b> function:</p>
<p><strong>c#:</strong> </p>
<pre class="csharpcode"><span class="kwrd">public</span> <span class="kwrd">void</span> ShiftPitch(<span class="kwrd">float</span>[] inputBuff, <span class="kwrd">float</span> inputPitch,
                       <span class="kwrd">float</span> targetPitch, <span class="kwrd">float</span>[] outputBuff, <span class="kwrd">int</span> nFrames)
{
     UpdateSettings();
     detectedPitch = inputPitch;</pre>
<p><strong>VB.Net</strong></p>
<pre class="csharpcode"><span class="kwrd">Public</span> <span class="kwrd">Sub</span> ShiftPitch(<span class="kwrd">ByVal</span> inputBuff() <span class="kwrd">As</span> <span class="kwrd">Single</span>, <span class="kwrd">ByVal</span> inputPitch <span class="kwrd">As</span> <span class="kwrd">Single</span>,
                    <span class="kwrd">ByVal</span> targetPitch <span class="kwrd">As</span> <span class="kwrd">Single</span>, <span class="kwrd">ByVal</span> outputBuff() <span class="kwrd">As</span> <span class="kwrd">Single</span>, <span class="kwrd">ByVal</span> nFrames <span class="kwrd">As</span> <span class="kwrd">Integer</span>)
    UpdateSettings()
    detectedPitch = inputPitch</pre>
<p>The <b>inputPitch</b> parameter is set to the frequency detected by the PitchDetector. The
<b>targetPitch</b> parameter is currently unused, but will be used to specify the target pitch in real-time when accepting input from, say, a MIDI keyboard. In any case, we call
<b>UpdateSettings</b> in order to see if any of the autotune algorithm settings have changed since last time.</p>
<p>Next we calculate the amount we need to shift the pitch shift. A shift factor of 1 means no change. We don't allow the shift factor to go above 2 or below 0.5, since these figures represent a whole octave change:</p>
<p><strong>c#:</strong> </p>
<pre class="csharpcode"><span class="kwrd">float</span> shiftFactor = 1.0f;

<span class="kwrd">if</span> (inputPitch &gt; 0)
{
    shiftFactor = snapFactor(inputPitch);
}

<span class="kwrd">if</span> (shiftFactor &gt; 2.0) shiftFactor = 2.0f;
<span class="kwrd">if</span> (shiftFactor &lt; 0.5) shiftFactor = 0.5f;</pre>
<p><strong>VB.Net</strong></p>
<pre class="csharpcode"><span class="kwrd">Dim</span> shiftFactor = 1.0F

<span class="kwrd">If</span> inputPitch &gt; 0 <span class="kwrd">Then</span>
   shiftFactor = snapFactor(inputPitch)
   shiftFactor &#43;= addVibrato(nFrames)
<span class="kwrd">End</span> <span class="kwrd">If</span>

<span class="kwrd">If</span> shiftFactor &gt; 2.0 <span class="kwrd">Then</span>
   shiftFactor = 2.0F
<span class="kwrd">End</span> <span class="kwrd">If</span>
<span class="kwrd">If</span> shiftFactor &lt; 0.5 <span class="kwrd">Then</span>
   shiftFactor = 0.5F
<span class="kwrd">End</span> If</pre>
<p>The decision of what the target note is takes place in the <b>snapFactor</b> function:</p>
<p><strong>c#:</strong></p>
<pre class="csharpcode"><span class="kwrd">protected</span> <span class="kwrd">float</span> snapFactor(<span class="kwrd">float</span> freq)
{
    <span class="kwrd">float</span> previousFrequency = 0.0f;
    <span class="kwrd">float</span> correctedFrequency = 0.0f;
    <span class="kwrd">int</span> previousNote = 0;
    <span class="kwrd">int</span> correctedNote = 0;
    <span class="kwrd">for</span> (<span class="kwrd">int</span> i = 1; i &lt; 120; i&#43;&#43;)
    {
        <span class="kwrd">bool</span> endLoop = <span class="kwrd">false</span>;
        <span class="kwrd">foreach</span> (<span class="kwrd">int</span> note <span class="kwrd">in</span> <span class="kwrd">this</span>.settings.AutoPitches)
        {
            <span class="kwrd">if</span> (i % 12 == note)
            {
                previousFrequency = correctedFrequency;
                previousNote = correctedNote;
                correctedFrequency = (<span class="kwrd">float</span>)(8.175 * Math.Pow(1.05946309, (<span class="kwrd">float</span>)i));
                correctedNote = i;
                <span class="kwrd">if</span> (correctedFrequency &gt; freq) { endLoop = <span class="kwrd">true</span>; }
                <span class="kwrd">break</span>;
            }
        }
        <span class="kwrd">if</span> (endLoop)
        {
            <span class="kwrd">break</span>;
        }
    }
    <span class="kwrd">if</span> (correctedFrequency == 0.0) { <span class="kwrd">return</span> 1.0f; }
    <span class="kwrd">int</span> destinationNote = 0;
    <span class="kwrd">double</span> destinationFrequency = 0.0;
    <span class="rem">// decide whether we are shifting up or down</span>
    <span class="kwrd">if</span> (correctedFrequency - freq &gt; freq - previousFrequency)
    {
        destinationNote = previousNote;
        destinationFrequency = previousFrequency;
    }
    <span class="kwrd">else</span>
    {
        destinationNote = correctedNote;
        destinationFrequency = correctedFrequency;
    }
    <span class="kwrd">if</span> (destinationNote != currPitch)
    {
        numElapsed = 0;
        currPitch = destinationNote;
    }
    <span class="kwrd">if</span> (attack &gt; numElapsed)
    {
        <span class="kwrd">double</span> n = (destinationFrequency - freq) / attack * numElapsed;
        destinationFrequency = freq &#43; n;
    }
    numElapsed&#43;&#43;;
    <span class="kwrd">return</span> (<span class="kwrd">float</span>)(destinationFrequency / freq);
}</pre>
<p><strong>VB.Net:</strong></p>
<pre class="csharpcode"><span class="kwrd">Protected</span> <span class="kwrd">Function</span> snapFactor(<span class="kwrd">ByVal</span> freq <span class="kwrd">As</span> <span class="kwrd">Single</span>) <span class="kwrd">As</span> <span class="kwrd">Single</span>
    <span class="kwrd">Dim</span> previousFrequency = 0.0F
    <span class="kwrd">Dim</span> correctedFrequency = 0.0F
    <span class="kwrd">Dim</span> previousNote = 0
    <span class="kwrd">Dim</span> correctedNote = 0
    <span class="kwrd">For</span> i = 1 <span class="kwrd">To</span> 119
        <span class="kwrd">Dim</span> endLoop = <span class="kwrd">False</span>
        <span class="kwrd">For</span> <span class="kwrd">Each</span> note <span class="kwrd">As</span> <span class="kwrd">Integer</span> <span class="kwrd">In</span> <span class="kwrd">Me</span>.settings.AutoPitches
            <span class="kwrd">If</span> i <span class="kwrd">Mod</span> 12 = note <span class="kwrd">Then</span>
                previousFrequency = correctedFrequency
                previousNote = correctedNote
                correctedFrequency = <span class="kwrd">CSng</span>(8.175 * Math.Pow(1.05946309, <span class="kwrd">CSng</span>(i)))
                correctedNote = i
                <span class="kwrd">If</span> correctedFrequency &gt; freq <span class="kwrd">Then</span>
                    endLoop = <span class="kwrd">True</span>
                <span class="kwrd">End</span> <span class="kwrd">If</span>
                <span class="kwrd">Exit</span> <span class="kwrd">For</span>
            <span class="kwrd">End</span> <span class="kwrd">If</span>
        <span class="kwrd">Next</span> note
        <span class="kwrd">If</span> endLoop <span class="kwrd">Then</span>
            <span class="kwrd">Exit</span> <span class="kwrd">For</span>
        <span class="kwrd">End</span> <span class="kwrd">If</span>
    <span class="kwrd">Next</span> i
    <span class="kwrd">If</span> correctedFrequency = 0.0 <span class="kwrd">Then</span>
        <span class="kwrd">Return</span> 1.0f
    <span class="kwrd">End</span> <span class="kwrd">If</span>
    <span class="kwrd">Dim</span> destinationNote = 0
    <span class="kwrd">Dim</span> destinationFrequency = 0.0
    <span class="rem">' decide whether we are shifting up or down</span>
    <span class="kwrd">If</span> correctedFrequency - freq &gt; freq - previousFrequency <span class="kwrd">Then</span>
        destinationNote = previousNote
        destinationFrequency = previousFrequency
    <span class="kwrd">Else</span>
        destinationNote = correctedNote
        destinationFrequency = correctedFrequency
    <span class="kwrd">End</span> <span class="kwrd">If</span>
    <span class="kwrd">If</span> destinationNote &lt;&gt; currPitch <span class="kwrd">Then</span>
        numElapsed = 0
        currPitch = destinationNote
    <span class="kwrd">End</span> <span class="kwrd">If</span>
    <span class="kwrd">If</span> attack &gt; numElapsed <span class="kwrd">Then</span>
        <span class="kwrd">Dim</span> n = (destinationFrequency - freq) / attack * numElapsed
        destinationFrequency = freq &#43; n
    <span class="kwrd">End</span> <span class="kwrd">If</span>
    numElapsed &#43;= 1
    <span class="kwrd">Return</span> <span class="kwrd">CSng</span>(destinationFrequency / freq)
<span class="kwrd">End</span> Function</pre>
<p>The way this function works is that it runs through the MIDI notes 0-120 and, if that note is selected as one of the valid pitches we support, we remember the “corrected frequency,” which can be calculated from the MIDI note number with the following formula:</p>
<p><strong>c#:</strong> </p>
<pre class="csharpcode">correctedFrequency = (<span class="kwrd">float</span>)(8.175 * Math.Pow(1.05946309, (<span class="kwrd">float</span>)midiNoteNumber));</pre>
<p><strong>VB.Net</strong></p>
<pre class="csharpcode">correctedFrequency = <span class="kwrd">CSng</span>(8.175 * Math.Pow(1.05946309, <span class="kwrd">CSng</span>(i)))</pre>
<p>Obviously, a pitch is likely to fall somewhere in between two valid notes, so we choose the which pitch to correct by determining which one is closest to the detected frequency.</p>
<p>The <b>snapFactor </b>function is also responsible for implementing the attack time parameter. This allows the destinationFrequency to be slowly moved to the target note over the duration of the attack period. Having calculated our shift factor, we are now
 ready to pass our data on to the actual pitch-shifting algorithm:</p>
<p><strong>c#:</strong> </p>
<pre class="csharpcode"><span class="kwrd">int</span> fftFrameSize = 2048;
<span class="kwrd">int</span> osamp = 8; <span class="rem">// 32 is best quality</span>
SmbPitchShift.smbPitchShift(shiftFactor, nFrames, fftFrameSize, osamp, <span class="kwrd">this</span>.sampleRate, inputBuff, outputBuff);</pre>
<p><strong>VB.Net</strong></p>
<pre class="csharpcode"><span class="kwrd">Dim</span> fftFrameSize = 2048
<span class="kwrd">Dim</span> osamp = 8 <span class="rem">' 32 is best quality</span>
SmbPitchShift.smbPitchShift(shiftFactor, nFrames, fftFrameSize, osamp, <span class="kwrd">Me</span>.sampleRate, inputBuff, outputBuff)</pre>
<p>The final thing we do in the <b>ShiftPitch </b>function is keep a record of the pitch shifts we have made. These are stored in a queue (maximum of 5000 entries) and are very useful for diagnosing what is going on if you are not getting the results you wanted
 from the algorithm:</p>
<p><strong>c#:</strong> </p>
<pre class="csharpcode">shiftedPitch = inputPitch * shiftFactor;
updateShifts(detectedPitch, shiftedPitch, <span class="kwrd">this</span>.currPitch);</pre>
<p><strong>VB.Net</strong></p>
<pre class="csharpcode">shiftedPitch = inputPitch * shiftFactor
updateShifts(detectedPitch, shiftedPitch, <span class="kwrd">Me</span>.currPitch)</pre>
<h2>Performance</h2>
<p>Performance, as you might expect in a managed application that has not been extensively optimized, was not good. Using my laptop, I could autotune one minute of audio in about 90 seconds. Obviously, that rules out real-time autotuning. I decided to profile
 the application to see if there were any quick ways I could improve things.</p>
<p>The profiling tools in Visual Studio revealed that 20% of the time was spent on pitch detection and 80% pitch shifting. Unfortunately, there were not too many options available for optimisation, since further investigation pointed to calls to
<b>Math.Sin</b> taking the bulk of the time. Possibly creating lookup tables could save a bit more time.
</p>
<p>Fortunately, we have another option for speeding things up. The pitch-shifting algorithm takes an “oversampling” parameter, which by default is set to 32, the highest value. However, we can trade off speed for quality. Setting it to 16 meant that I could
 autotune a minute of audio in 55ms (on my 2.4GHz Core2Duo laptop) – realtime but only just. Setting it to 8 reduced that down to 36s. The results still sounded reasonable, so I have left it set at 8 in the code.
</p>
<p>An alternative way of speeding it up though would be to swap in a different pitch-shifting algorithm. You could start by trying out one I created as part of the
<a href="http://blogs.msdn.com/b/coding4fun/archive/2009/02/02/9391048.aspx">Skype Voice Changer project</a> previously featured on Coding4Fun, which is also able to operate in real-time (although I haven't done any quality comparisons).</p>
<h2>Creating a Test GUI</h2>
<p>Rather than starting from scratch, I decided to build upon <a href="http://voicerecorder.codeplex.com/">
.NET Voice Recorder</a>, a WPF application I created for <a href="http://blogs.msdn.com/b/coding4fun/archive/2009/10/08/9905168.aspx">
a previous Coding4Fun article</a>. This takes advantage of the <a href="file:///C:/Users/crutkas/AppData/Local/Microsoft/Windows/Temporary Internet Files/Content.IE5/SMJQO6XL/naudio.codeplex.com">
NAudio</a> .NET audio library for audio recording and playback capabilities. The GUI has three screens. On the first is the input device used for recording. The second records a short voice clip. And the third allows you to edit a small portion of saved audio.</p>
<p>Here's a screenshot of the second screen showing a recording in progress:</p>
<p><a href="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/10112293/image.png"><img title="image" border="0" alt="image" src="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/10112293/image_thumb.png" width="320" height="380"></a></p>
<p>And here's the screen that allows you to trim the recording, preview it, and save it as WAV:</p>
<p><a href="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/10112293/image_3.png"><img title="image" border="0" alt="image" src="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/10112293/image_thumb_3.png" width="320" height="380"></a></p>
<p>As you can see, I have added a new button allowing access to the autotune effect settings. On this screen, you can select which notes are valid, and you can also adjust the “attack time” if you prefer to not go for the robotic effect. I've included a drop-down
 menu that automatically selects the appropriate notes from various keys.</p>
<p><a href="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/10112293/image_4.png"><img title="image" border="0" alt="image" src="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/10112293/image_thumb_4.png" width="320" height="360"></a></p>
<p><b></b></p>
<p>When you click “Apply,” the autotune effect is applied (while you wait on a background thread) and then you are returned to the screen, allowing you to play back your recording and see how it sounds. If you'd like, you can then go back and change the autotune
 settings (or turn it off).</p>
<h3>MVVM Light</h3>
<p>The original VoiceRecorder application used a <a href="http://en.wikipedia.org/wiki/Model_View_ViewModel">
MVVM</a> (model-view-viewmodel) architecture for binding data to each view. I have updated it to make use of Laurent Bugnion's excellent
<a href="http://www.galasoft.ch/mvvm/getstarted/">MVVM Light</a> library. This removes the need for my own RelayCommand and ViewModelBase classes, and also enables me to replace my ViewManager with a more extensible framework using the event aggregator (“<a href="http://blog.galasoft.ch/archive/2009/09/27/mvvm-light-toolkit-messenger-v2-beta.aspx">Messenger</a>”)
 that is included with MVVM light. This allows me to quickly navigate from one view to another by sending out a message on the event aggregator:</p>
<p><strong>c#:</strong> </p>
<pre class="csharpcode"><span class="kwrd">private</span> <span class="kwrd">void</span> NavigateToSaveView()
{
   Messenger.Default.Send(<span class="kwrd">new</span> NavigateMessage(SaveViewModel.ViewName, <span class="kwrd">this</span>.voiceRecorderState));
}</pre>
<p><strong>VB.Net</strong></p>
<pre class="csharpcode"><span class="kwrd">Private</span> <span class="kwrd">Sub</span> NavigateToSaveView()
    Messenger.<span class="kwrd">Default</span>.Send(<span class="kwrd">New</span> NavigateMessage(SaveViewModel.ViewName, <span class="kwrd">Me</span>.voiceRecorderState))
<span class="kwrd">End</span> Sub</pre>
<h3>Getting the Best out of Autotune</h3>
<p>Unfortunately, Autotune is an effect that doesn't always produce the desired result . Obviously, if you want great autotune, you're best off buying a commercial implementation, but here are a few tips for getting the most out of an autotune algorithm:</p>
<ul>
<li><b>Get a good quality recording</b>. Avoid background noise, hum, and too quiet or too loud (distorted) recordings. If you need to sing against a backing track, play it in headphones so the microphone doesn't pick it up.
</li><li><b>Know what key you are singing in</b>. This is where the test application won't help you out, since if you don't know what key you are singing in, you can hardly expect to be able to select the appropriate key from the list. You might even be singing
 in between two keys! If you have an in-tune musical instrument at hand, play a note to give yourself a starting pitch.
</li><li><b>Choose your scale. </b>The easiest option is to just go with a chromatic scale, which means all 12 notes are valid. However, you can try to make the autotune force you into a monotone. Pentatonic scales are useful for instant gratification. They have
 five notes in them, and so long as you stick to those five, almost anything you sing will fit in with a backing track in your chosen key.
</li><li><b>Adjust the attack time</b>. An attack time of zero is great for the robot effect. A longer attack time will smooth out transitions.
</li><li><b>Why is it “warbling”? </b>With this autotune algorithm, it is quite common to get a “warbling” effect. This is because it is either changing its mind about what note to pitch shift to (because the pitch detector is not providing stable pitch detection),
 or because the pitch detector couldn't determine a pitch at all, so your voice isn't being pitch shifted. You can play around with the release slider (and may need to modify the release algorithm) if you want to eliminate warbling.
</li></ul>
<h2>Taking it further</h2>
<p>.NET Voice Recorder is open source <a href="http://voicerecorder.codeplex.com">
and hosted on CodePlex</a> in a Mercurial repository. So what you waiting for? Make a fork and have a go at improving it:</p>
<ul>
<li><b>Improve the pitch detector algorithm</b>. I have already given some suggestions for how this can be done. One idea that might be worth experimenting with is making it “hold” the detected frequency for a short period until a strong, new frequency is detected.
</li><li><b>Display the detected pitches.</b> The autotune effect stores data of the detected pitches as well as the pitches it attempts to convert to. You might display this information to the user, perhaps underneath the waveform, so they can see what it is detecting.
 (It currently outputs this information with Debug.WriteLine, which is useful for debugging purposes).
</li><li><b>Suggest an appropriate scale? </b>Instead of leaving it to the user to select what key to snap the notes to, how about auto-selecting the detected notes?
</li><li><b>Allow direct input of desired pitch</b>. Instead of letting the autotune effect try to work out what note it should be targeting at any given time, it is far more effective to let the user input what note they would like to shift to. This could be done
 by entering the note using a MIDI keyboard in real-time, or by drawing the notes in, perhaps on a “piano-roll” control. Or, users could implement a simple, domain-specific language to specify the desired note. For example:
<ul>
<li>0:00.0 C# </li><li>0:01.5 E </li><li>0:02.7 G# </li></ul>
</li><li><b>Port it to Windows Phone 7</b>. This would be quite a cool effect to have on your phone. Of course, you might need to optimize performance a bit more to save battery life. And you'll want to put your graphic designer hat on to give it a more beautiful
 look than it currently has. </li></ul>
<h2>About the Author</h2>
<p>Mark Heath is the author of several open source .NET applications and libraries, including
<a href="http://naudio.codeplex.com">
NAudio</a> and the <a href="http://skypefx.codeplex.com">
Skype Voice Changer</a>. He works for <a href="http://nice.com/">NICE Systems</a>, developing applications that search, display, and play back vast amounts of multimedia data. He has a blog,
<a href="http://mark-dot-net.blogspot.com">Sound Code</a>, and you can follow him on his sporadically updated
<a href="http://twitter.com/mark_heath">Twitter account</a>.</p>
 <img src="http://m.webtrends.com/dcs1wotjh10000w0irc493s0e_6x1g/njs.gif?dcssip=channel9.msdn.com&dcsuri=http://channel9.msdn.com/Niners/markheath/Posts/RSS&WT.dl=0&WT.entryid=Entry:RSSView:f73dedce787c457fb8b09e7600c6e4d6">]]></description>
      <comments>http://channel9.msdn.com/coding4fun/articles/AutotuneNET</comments>
      <itunes:summary>We&#39;ve all cringed as a hopelessly out of tune contestant appears on the latest episode of “American Idol.” Occasionally, there&#39;s a contestant who manages to be pitch perfect all the way through—right until they flub the final note. And in the cutthroat world
 of televised auditions, sing one slightly flat note and you&#39;re out.  
So what takes care of a bad-pitch day? Autotune—an effect that corrects the pitch of your voice so you&#39;ll never again sing out of tune. And now, with the power of modern microprocessors, autotune is possible in real-time, allowing singers to benefit from
 its almost magical powers during live concerts. 
The company most famous for its autotune effect is Antares. 
Antares Auto-Tune currently retails for $249, and a stripped down version is available for $100. In addition to simply improving the pitch of a dodgy singer, autotune can be used to create unique robotic sounding vocal effects, a technique massively popular
 in recent years thanks to its use by artists such as T-Pain and the group behind the “Auto-Tune the News” YouTube videos. In 1998, when the effect was first used on
Cher&#39;s “Believe” single, the producer used such extreme settings that instead of subtly adjusting the pitch, autotune “snapped” instantaneously to the nearest “correct” note.
 
Here is a nerdy example of what Autotune can do. 



How does Autotune work?
An autotune effect has two parts. The first is pitch detection, which calculates the dominant frequency of the incoming signal, and is the reason autotune is normally used on monophonic audio sources (i.e. playing one note at a time, not whole chords).
 So, if your guitar is out of tune, you&#39;re out of luck (Celemony&#39;s Melodyne product, however, features some incredible capabilities for pitch-shifting polyphonic audio). 
The second stage is pitch shifting, or “correcting” a given note. However, the bigger the pitch shift required, the more artificial the end result will be, and it is worth noting that absolutely perfect</itunes:summary>
      <link>http://channel9.msdn.com/coding4fun/articles/AutotuneNET</link>
      <pubDate>Wed, 05 Jan 2011 22:19:46 GMT</pubDate>
      <guid isPermaLink="false">http://channel9.msdn.com/coding4fun/articles/AutotuneNET</guid>
      <media:thumbnail url="http://ecn.channel9.msdn.com/o9/c4f/images/10112293_100.jpg" height="75" width="100"></media:thumbnail>
      <media:thumbnail url="http://ecn.channel9.msdn.com/o9/c4f/images/10112293_220.jpg" height="165" width="220"></media:thumbnail>      
      <dc:creator>Mark Heath</dc:creator>
      <itunes:author>Mark Heath</itunes:author>
      <slash:comments>20</slash:comments>
      <wfw:commentRss>http://channel9.msdn.com/coding4fun/articles/AutotuneNET/RSS</wfw:commentRss>
      <category>Audio</category>
      <category>MVVM</category>
      <category>WPF</category>
    </item>
  <item>
      <title>Skype Voice Changer</title>
      <description><![CDATA[
<p>In this article I demonstrate how you can create your own audio effects in .NET to manipulate digital audio at the sample level. These effects are used to process MP3 files while they are being played back, and to process the real-time microphone input allowing
 you to change your voice during a Skype conversation.</p>
<table border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td valign="top" width="157">
<p>Mark Heath, <a href="http://mark-dot-net.blogspot.com">blog</a> <br>
</p>
</td>
<td valign="top" width="481">
<p><b>Source Code:</b> <a href="http://www.codeplex.com/skypefx">Download</a></p>
<p><b>Difficulty:</b> Intermediate <br>
<b>Time Required:</b> 8 hours <br>
<b>Cost:</b> Free <br>
<b>Software Needed:</b> <a href="http://www.microsoft.com/express/download/">Visual Basic or Visual C# Express 2008</a><b>
<br>
</b><b>Libraries:</b> <a href="http://www.codeplex.com/naudio">NAudio</a>, <a href="https://developer.skype.com/Docs/Skype4COM">
Skype4COM</a>, <a href="http://www.codeplex.com/MEF">MEF</a></p>
</td>
</tr>
</tbody>
</table>
<h3><a href="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_15.png"><img title="image" border="0" alt="image" src="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_thumb_12.png" width="500" height="204"></a>
</h3>
<h3>Audio and the .NET Framework</h3>
<p>Playing back audio in a .NET application is not quite as easy as you might hope it would be. The .NET 2.0 Framework introduced the
<a href="http://msdn.microsoft.com/en-us/library/system.media.soundplayer.aspx">SoundPlayer</a> component, which allows you to play back an existing WAV file. While this may be fine for many scenarios, as soon as you want to do things even slightly more advanced,
 such as changing the volume, or playing back from a different file-format, or pausing and repositioning, you must resort to writing P/Invoke wrappers for various Windows APIs.</p>
<p>Back in 2002, as I was getting started learning .NET, I create some audio-related classes of my own to compensate for the lack of audio support in the .NET Framework. I focused initially on reading and writing WAV and MIDI files, as well as playing back
 audio in a way that allowed real-time mixing and manipulation of the audio at a sample level. As time went by, I made this growing collection of audio classes available as an open source project, called
<a href="http://www.codeplex.com/naudio">NAudio</a>, now hosted at CodePlex.</p>
<h4>Audio Playback in NAudio</h4>
<p>NAudio works by constructing an audio playback graph. Audio comes in “streams” which can be connected together and modified before eventually they go to a renderer. This might be your soundcard if you are listening to the audio, or it might be to a file
 on the hard disk.</p>
<p>In NAudio, all streams derive from <b>WaveStream</b>. NAudio comes with a collection of useful
<b>WaveStream</b> derived classes such as <b>WaveFileReader</b> to read from WAV files or
<b>WaveStreamMixer</b> to sum together multiple audio streams. </p>
<p>Audio mixing and effects are almost always performed with floating point numbers (32 bit is most common), so one of the first steps after reading audio out of a WAV file is to convert it from 16 bit to 32 bit. NAudio includes the
<b>Wave16To32ConversionStream </b>class to do this. If the audio wasn't in PCM in the first place, for example it is MP3, then we make use of a combination of the
<b>Mp3FileReaderStream</b>, <b>WaveFormatConversionStream</b> and <b>BlockAlignmentReductionStream</b> to read the audio out and get it into just the format we want. Here's an example showing how to create a
<b>WaveStream</b> ready for playback.</p>
<pre class="csharpcode">WaveStream outStream;
<span class="kwrd">if</span> (fileName.EndsWith(<span class="str">&quot;.mp3&quot;</span>))
{
   outStream = <span class="kwrd">new</span> Mp3FileReader(fileName);
}
<span class="kwrd">else</span> <span class="kwrd">if</span>(fileName.EndsWith(<span class="str">&quot;.wav&quot;</span>))
{
   outStream = <span class="kwrd">new</span> WaveFileReader(fileName);
}
<span class="kwrd">else</span>
{
   <span class="kwrd">throw</span> <span class="kwrd">new</span> InvalidOperationException(<span class="str">&quot;Can't open this type of file&quot;</span>);
}                
<span class="kwrd">if</span> (outStream.WaveFormat.Encoding != WaveFormatEncoding.Pcm)
{
   outStream = WaveFormatConversionStream.CreatePcmStream(outStream);
   outStream = <span class="kwrd">new</span> BlockAlignReductionStream(outStream); <span class="rem">// reduces choppiness</span>
}</pre>
<style type="text/css">
<!--
.csharpcode, .csharpcode 
	{font-size:small;
	color:black;
	font-family:consolas,"Courier New",courier,monospace;
	background-color:#ffffff}
.csharpcode 
	{margin:0em}
.csharpcode .rem
	{color:#008000}
.csharpcode .kwrd
	{color:#0000ff}
.csharpcode .str
	{color:#006080}
.csharpcode .op
	{color:#0000c0}
.csharpcode .preproc
	{color:#cc6633}
.csharpcode .asp
	{background-color:#ffff00}
.csharpcode .html
	{color:#800000}
.csharpcode .attr
	{color:#ff0000}
.csharpcode .alt
	{background-color:#f4f4f4;
	width:100%;
	margin:0em}
.csharpcode .lnum
	{color:#606060}
-->
</style>
<p>If we simply want to play back the audio without any extra processing, we would use one of the audio output classes provided by NAudio to create an object that implements
<b>IWavePlayer</b>. The options are <b>WaveOut</b>, <b>DirectSoundOut</b>, <b>AsioOut</b> and
<b>WasapiOut</b>, each representing a different technology for audio playback in Windows. We will use
<b>WaveOut</b>, which is the most universally supported. Here we are opening the default output device with a latency of 300ms, and instructing it to use windowed callbacks.</p>
<pre class="csharpcode">IWavePlayer player = <span class="kwrd">new</span> WaveOut(0, 300, <span class="kwrd">true</span>);
player.Init(outStream);
player.Play();</pre>
<p><b>WaveOut</b> will now repeatedly call the <b>Read</b> method of the output stream to get the next batch of audio samples to play. All we need to do now is to insert our audio effects into the playback chain.</p>
<h4>An Audio Effects Framework</h4>
<p>To allow us to process sample level audio more simply, I have create a new <b>
WaveStream</b> derived class called <b>EffectStream</b>. <b>EffectStream</b> will simply pass each audio sample to one or more audio effects before returning the modified audio in its
<b>Read</b> method. The reason I have chosen to host all the effects in a single <b>
EffectSteam</b> rather than creating one <b>WaveStream</b> derived class is that I want to avoid the performance penalty of converting between arrays of bytes and arrays of floating point numbers at every step. Some types of effect, particularly those involving
 Fourier transforms can be processor intensive, so anything we can do to speed up performance will help.</p>
<p>As well as the <b>EffectStream </b>class, we need a base <b>Effect</b> class, from which all of our effects can derive. Here's a simplified version of the base Effect class (minus a whole load of helper mathematical functions):</p>
<pre class="csharpcode"><span class="kwrd">public</span> <span class="kwrd">abstract</span> <span class="kwrd">class</span> Effect
{
    <span class="kwrd">private</span> List&lt;Slider&gt; sliders;
    <span class="kwrd">public</span> <span class="kwrd">float</span> SampleRate { get; set; }
    <span class="kwrd">public</span> <span class="kwrd">float</span> Tempo { get; set; }
    <span class="kwrd">public</span> <span class="kwrd">bool</span> Enabled { get; set; }

    <span class="kwrd">public</span> Effect()
    {
        sliders = <span class="kwrd">new</span> List&lt;Slider&gt;();
        Enabled = <span class="kwrd">true</span>;
        Tempo = 120;
        SampleRate = 44100;
    }

    <span class="kwrd">public</span> IList&lt;Slider&gt; Sliders { get { <span class="kwrd">return</span> sliders; } }

    <span class="kwrd">public</span> Slider AddSlider(<span class="kwrd">float</span> defaultValue, <span class="kwrd">float</span> minimum, 
            <span class="kwrd">float</span> maximum, <span class="kwrd">float</span> increment, <span class="kwrd">string</span> description)
    {
        Slider slider = <span class="kwrd">new</span> Slider(defaultValue, minimum, 
            maximum, increment, description);
        sliders.Add(slider);
        <span class="kwrd">return</span> slider;
    }

    <span class="rem">/// &lt;summary&gt;</span>
    <span class="rem">/// Should be called on effect load, </span>
    <span class="rem">/// sample rate changes, and start of playback</span>
    <span class="rem">/// &lt;/summary&gt;</span>
    <span class="kwrd">public</span> <span class="kwrd">virtual</span> <span class="kwrd">void</span> Init()
    {}

    <span class="rem">/// &lt;summary&gt;</span>
    <span class="rem">/// will be called when a slider value has been changed</span>
    <span class="rem">/// &lt;/summary&gt;</span>
    <span class="kwrd">public</span> <span class="kwrd">abstract</span> <span class="kwrd">void</span> Slider();

    <span class="rem">/// &lt;summary&gt;</span>
    <span class="rem">/// called before each block is processed</span>
    <span class="rem">/// &lt;/summary&gt;</span>
    <span class="kwrd">public</span> <span class="kwrd">virtual</span> <span class="kwrd">void</span> Block()
    { }

    <span class="rem">/// &lt;summary&gt;</span>
    <span class="rem">/// called for each sample</span>
    <span class="rem">/// &lt;/summary&gt;</span>
    <span class="kwrd">public</span> <span class="kwrd">abstract</span> <span class="kwrd">void</span> Sample(<span class="kwrd">ref</span> <span class="kwrd">float</span> spl0, <span class="kwrd">ref</span> <span class="kwrd">float</span> spl1);
}</pre>
<p>The bulk of the work of the effect should be done in the overridden <b>Sample</b> method. The
<b>spl0</b> and <b>spl1</b> parameters contain the current sample values to be modified for the left and right channels respectively. For example, to lower the volume we could halve the amplitude of every sample with the following code:</p>
<pre class="csharpcode"><span class="kwrd">public</span> <span class="kwrd">override</span> <span class="kwrd">void</span> Sample(<span class="kwrd">ref</span> <span class="kwrd">float</span> spl0, <span class="kwrd">ref</span> <span class="kwrd">float</span> spl1)
{
    spl0 *= 0.5f;
    spl1 *= 0.5f;
}</pre>
<p>The <b>Effect</b> class contains <b>Tempo</b> and <b>SampleRate</b> values which are useful for certain types of effect. It also contains a concept of ‘Sliders' for each effect. These are the effect parameters, which allow real-time modification of the effect.
 So if we wanted to control the volume using a slider, we could write the following code (although bear in mind that normally volume sliders should be logarithmic not linear – see the
<b>Volume</b> effect in the sample code for an example of how to do this):</p>
<pre class="csharpcode"><span class="kwrd">public</span> <span class="kwrd">override</span> <span class="kwrd">void</span> Sample(<span class="kwrd">ref</span> <span class="kwrd">float</span> spl0, <span class="kwrd">ref</span> <span class="kwrd">float</span> spl1)
{
    spl0 *= slider1;
    spl1 *= slider1;
}</pre>
<p>To simplify the task of adding, removing and re-ordering effects, I created an
<b>EffectChain</b> class, which is a simple wrapper around a <b>List&lt;Effect&gt;</b>. The
<b>EffectStream</b> class has an <b>EffectChain</b> that contains all the effects it needs to run. Here is the code for the
<b>EffectStream</b>:</p>
<pre class="csharpcode"><span class="kwrd">public</span> <span class="kwrd">class</span> EffectStream : WaveStream
{
    <span class="kwrd">private</span> EffectChain effects;
    <span class="kwrd">public</span> WaveStream source;
    <span class="kwrd">private</span> <span class="kwrd">object</span> effectLock = <span class="kwrd">new</span> <span class="kwrd">object</span>();
    <span class="kwrd">private</span> <span class="kwrd">object</span> sourceLock = <span class="kwrd">new</span> <span class="kwrd">object</span>();

    <span class="kwrd">public</span> EffectStream(EffectChain effects, WaveStream sourceStream)
    {
        <span class="kwrd">this</span>.effects = effects;
        <span class="kwrd">this</span>.source = sourceStream;
        <span class="kwrd">foreach</span> (Effect effect <span class="kwrd">in</span> effects)
        {
            InitialiseEffect(effect);
        }

    }

    <span class="kwrd">public</span> EffectStream(WaveStream sourceStream)
        : <span class="kwrd">this</span>(<span class="kwrd">new</span> EffectChain(), sourceStream)
    {        
    }

    <span class="kwrd">public</span> EffectStream(Effect effect, WaveStream sourceStream)
        : <span class="kwrd">this</span>(sourceStream)
    {
        AddEffect(effect);
    }

    <span class="kwrd">public</span> <span class="kwrd">override</span> WaveFormat WaveFormat
    {
        get { <span class="kwrd">return</span> source.WaveFormat; }
    }

    <span class="kwrd">public</span> <span class="kwrd">override</span> <span class="kwrd">long</span> Length
    {
        get { <span class="kwrd">return</span> source.Length; }
    }

    <span class="kwrd">public</span> <span class="kwrd">override</span> <span class="kwrd">long</span> Position
    {
        get { <span class="kwrd">return</span> source.Position; }
        set { <span class="kwrd">lock</span> (sourceLock) { source.Position = <span class="kwrd">value</span>; } }
    }        

    <span class="kwrd">public</span> <span class="kwrd">override</span> <span class="kwrd">int</span> Read(<span class="kwrd">byte</span>[] buffer, <span class="kwrd">int</span> offset, <span class="kwrd">int</span> count)
    {
        <span class="kwrd">int</span> read;
        <span class="kwrd">lock</span>(sourceLock)
        {
            read = source.Read(buffer, offset, count);
        }
        <span class="kwrd">if</span> (WaveFormat.BitsPerSample == 16)
        {
            <span class="kwrd">lock</span> (effectLock)
            {
                Process16Bit(buffer, offset, read);
            }
        }
        <span class="kwrd">return</span> read;
    }

    <span class="kwrd">private</span> <span class="kwrd">void</span> Process16Bit(<span class="kwrd">byte</span>[] buffer, <span class="kwrd">int</span> offset, <span class="kwrd">int</span> count)
    {
        <span class="kwrd">foreach</span> (Effect effect <span class="kwrd">in</span> effects)
        {
            <span class="kwrd">if</span> (effect.Enabled)
            {
                effect.Block();
            }
        }

        <span class="kwrd">for</span>(<span class="kwrd">int</span> sample = 0; sample &lt; count/2; sample&#43;&#43;)
        {
            <span class="rem">// get the sample(s)</span>
            <span class="kwrd">int</span> x = offset &#43; sample * 2;
            <span class="kwrd">short</span> sample16Left = BitConverter.ToInt16(buffer, x);
            <span class="kwrd">short</span> sample16Right = sample16Left;
            <span class="kwrd">if</span>(WaveFormat.Channels == 2)
            {                    
                sample16Right = BitConverter.ToInt16(buffer, x &#43; 2);
                sample&#43;&#43;;
            }
           
            <span class="rem">// run these samples through the effects</span>
            <span class="kwrd">float</span> sample64Left = sample16Left / 32768.0f;
            <span class="kwrd">float</span> sample64Right = sample16Right / 32768.0f;
            <span class="kwrd">foreach</span> (Effect effect <span class="kwrd">in</span> effects)
            {
                <span class="kwrd">if</span> (effect.Enabled)
                {
                    effect.Sample(<span class="kwrd">ref</span> sample64Left, <span class="kwrd">ref</span> sample64Right);
                }
            }

            sample16Left = (<span class="kwrd">short</span>)(sample64Left * 32768.0f);
            sample16Right = (<span class="kwrd">short</span>)(sample64Right * 32768.0f);

            <span class="rem">// put them back</span>
            buffer[x] = (<span class="kwrd">byte</span>)(sample16Left &amp; 0xFF);
            buffer[x &#43; 1] = (<span class="kwrd">byte</span>)((sample16Left &gt;&gt; 8) &amp; 0xFF); 

            <span class="kwrd">if</span>(WaveFormat.Channels == 2)    
            {
                buffer[x &#43; 2] = (<span class="kwrd">byte</span>)(sample16Right &amp; 0xFF);
                buffer[x &#43; 3] = (<span class="kwrd">byte</span>)((sample16Right &gt;&gt; 8) &amp; 0xFF);
            }
        }
    }


    <span class="kwrd">public</span> <span class="kwrd">bool</span> MoveUp(Effect effect)
    {
        <span class="kwrd">lock</span> (effectLock)
        {
            <span class="kwrd">return</span> effects.MoveUp(effect);
        }
    }

    <span class="kwrd">public</span> <span class="kwrd">bool</span> MoveDown(Effect effect)
    {
        <span class="kwrd">lock</span> (effectLock)
        {
            <span class="kwrd">return</span> effects.MoveDown(effect);
        }
    }

    <span class="kwrd">public</span> <span class="kwrd">void</span> AddEffect(Effect effect)
    {
        InitialiseEffect(effect);
        <span class="kwrd">lock</span> (effectLock)
        {
            <span class="kwrd">this</span>.effects.Add(effect);
        }
    }

    <span class="kwrd">private</span> <span class="kwrd">void</span> InitialiseEffect(Effect effect)
    {
        effect.SampleRate = WaveFormat.SampleRate;
        effect.Init();
        effect.Slider();
    }

    <span class="kwrd">public</span> <span class="kwrd">bool</span> RemoveEffect(Effect effect)
    {
        <span class="kwrd">lock</span> (effectLock)
        {
            <span class="kwrd">return</span> <span class="kwrd">this</span>.effects.Remove(effect);
        }
    }
}</pre>
<p>When the <b>Read</b> method on <b>EffectStream</b> is called, we first read the requested number of bytes from our source
<b>WaveStream</b>. This might be from a WAV or MP3 file, or from a microphone. Then, we convert it from 16 bit to 32 bit floating point audio. 16 bit audio is stored as integers going from -32,768 to 32,767, and 32 bit audio uses the range -1.0 to 1.0 to represent
 this range. This means we have plenty of headroom to mix together multiple signals without distorting. It is important though to remember that no samples should be greater than 1.0 before converting back to 16 bit.</p>
<h3>Porting Effects to .NET</h3>
<p>Now we have a basic effect framework, it is time to create some real effects to use. There are many sources of algorithms for digital signal processing (DSP) (try
<a href="http://www.musicdsp.org/">musicdsp.org</a> for a good starting point), but I have chosen to base my effects model on that provided by the
<a href="http://www.reaper.fm/">REAPER digital audio workstation</a> (DAW). This impressive application, masterminded by legendary software developer
<a href="http://en.wikipedia.org/wiki/Justin_Frankel">Justin Frankel</a>, includes a text-based effects framework. These effects, known as
<a href="http://www.reaper.fm/sdk/js/">JS effects</a>, allow the use of a C-like syntax to quickly write your own effects. I have modelled my
<b>Effect</b> class on the JS syntax, allowing me to quickly port effects across.</p>
<pre class="csharpcode"><span class="kwrd">public</span> <span class="kwrd">class</span> Tremolo : Effect
{
    <span class="kwrd">public</span> Tremolo()
    {
        AddSlider(4,0,100,1,<span class="str">&quot;frequency (Hz)&quot;</span>);
        AddSlider(-6,-60,0,1,<span class="str">&quot;amount (dB)&quot;</span>);
        AddSlider(0, 0, 1, 0.1f, <span class="str">&quot;stereo separation (0..1)&quot;</span>);
    }

    <span class="kwrd">float</span> adv, sep, amount, sc, pos;

    <span class="kwrd">public</span> <span class="kwrd">override</span> <span class="kwrd">void</span> Slider()
    {
        adv=PI*2*slider1/SampleRate;
        sep=slider3*PI;
        amount=pow(2,slider2/6);
        sc=0.5f*amount; amount=1-amount;
    }

    <span class="kwrd">public</span> <span class="kwrd">override</span> <span class="kwrd">void</span> Sample(<span class="kwrd">ref</span> <span class="kwrd">float</span> spl0, <span class="kwrd">ref</span> <span class="kwrd">float</span> spl1)
    {
        spl0 = spl0 * ((cos(pos) &#43; 1) * sc &#43; amount);
        spl1 = spl1 * ((cos(pos &#43; sep) &#43; 1) * sc &#43; amount);
        pos &#43;= adv;
    }
}</pre>
<p>Some members in the <b>Effect</b> base class such as <b>cos</b> and <b>slider1</b> allow me to keep the ported syntax as similar to the original JS script as possible.</p>
<p>REAPER ships with well over 100 of these JS Effects, so I chose about 15 of them and ported them to .NET. They are available in the download that accompanies this article. With some of them, for example pitch shifting effects, you will immediately notice
 the effect on the sound, while others, such as compressors, require some knowledge of how to adjust the parameters to get good results.</p>
<h4>The Test Harness</h4>
<p>Obviously we need a way to pass audio through our effects, so our next task is to create a test harness that will allow us to load in audio files and listen to them with the effects applied. For this purpose I created a simple Windows Forms application that
 allows you to select a WAV or MP3 file to play back. After being converted into PCM using various classes from the NAudio library, the resulting
<b>WaveStream</b> is passed through an <b>EffectStream </b>before being passed to the soundcard for playback. The use of an
<b>EffectChain </b>allows us to modify the effects loaded, and their order during playback.</p>
<p>To make the loading of effects simpler, I used the <a></a><a href="http://www.codeplex.com/MEF">Managed Extensibility Framework</a> (MEF), to
<a href="#_msocom_1" name="_msoanchor_1">[DF1]</a> make each effect a “plugin” to the test harness. Each effect is decorated with an
<b>Export</b> attribute to indicate that it is a plugin:</p>
<pre class="csharpcode">[Export(<span class="kwrd">typeof</span>(Effect))]
<span class="kwrd">public</span> <span class="kwrd">class</span> SuperPitch : Effect</pre>
<p>Then I can request that MEF auto-populates a property with all the exported effects it can find:</p>
<pre class="csharpcode">[Import]
<span class="kwrd">public</span> ICollection&lt;Effect&gt; Effects { get; set; }</pre>
<p>When the user selects an effect from the list of available effects, we create a new instance of it. This is because you might want to put the same effect into the effect chain more than once:</p>
<pre class="csharpcode">EffectSelectorForm effectSelectorForm = <span class="kwrd">new</span> EffectSelectorForm(Effects);
<span class="kwrd">if</span> (effectSelectorForm.ShowDialog(<span class="kwrd">this</span>) == DialogResult.OK)
{
    <span class="rem">// create a new instance of the selected effect </span>
    <span class="rem">// as we may want multiple copies of one effect</span>
    Effect effect = (Effect)Activator.CreateInstance(
       effectSelectorForm.SelectedEffect.GetType());
    audioGraph.AddEffect(effect);
    checkedListBox1.Items.Add(effect, <span class="kwrd">true</span>);
}</pre>
<p>To allow real-time modification of the effect parameters, I created two user controls. The first,
<b>EffectSliderPanel</b> allows you to hook up a Windows Forms <b>TrackBar</b> to one of our effect's sliders and manages the minimum, maximum and granularity settings. The second user control,
<b>EffectPanel</b> takes an <b>Effect</b> and creates one <b>EffectSliderPanel</b> for each slider in that effect. It also is responsible for calling the
<b>Slider</b> method on the <b>Effect</b> whenever the user moves one of the sliders. Here's an example of what it looks like:</p>
<p><img title="image" border="0" alt="image" src="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image.png" width="488" height="164">
</p>
<p>Now we are able to test our effects by listening to WAV files and playing them with real-time control over their parameters.</p>
<h3>Intercepting Skype Audio</h3>
<p>There are many interesting uses for audio effects, but it was suggested to me that I create a “voice changer” for Skype as the example program for this article. At first I didn't think that this would be possible, as you would need access to the audio samples
 from the microphone <i>before</i> Skype transmitted them over the network.</p>
<p>However, it turns out that Skype has a full featured SDK to allow all kinds of third-party add-ons and enhancements. The Skype API can be used in .NET via a COM object, called
<a href="https://developer.skype.com/">Skype4Com</a>. Skype plugins are not loaded directly by the Skype application but attach to it via network sockets. The Skype4Com COM object hides much of this complexity from the user.</p>
<p>Having added Skype4Com as a reference to our application, we then need to connect to Skype. This is achieved by using the following code:</p>
<pre class="csharpcode"><span class="kwrd">const</span> <span class="kwrd">int</span> Protocol = 8;
skype = <span class="kwrd">new</span> Skype();
_ISkypeEvents_Event events = (_ISkypeEvents_Event)skype;
events.AttachmentStatus &#43;= OnSkypeAttachmentStatus;            
skype.CallStatus &#43;= OnSkypeCallStatus;
skype.Attach(Protocol, <span class="kwrd">false</span>);</pre>
<p>In the <b>CallStatus</b> event handler, we tell Skype that we wish to ‘capture' the microphone. This will cause it to send us the raw audio data from the microphone via a TCP socket. Then we tell it that we will send the audio to be transmitted using another
 TCP socket.</p>
<pre class="csharpcode"><span class="kwrd">void</span> OnSkypeCallStatus(Call call, TCallStatus status)
{
    log.Info(<span class="str">&quot;SkypeCallStatus: {0}&quot;</span>, status);
    <span class="kwrd">if</span> (status == TCallStatus.clsInProgress)
    {
        <span class="kwrd">this</span>.call = call;                  
        call.set_CaptureMicDevice(
        TCallIoDeviceType.callIoDeviceTypePort, MicPort.ToString());
        call.set_InputDevice(
        TCallIoDeviceType.callIoDeviceTypeSoundcard, <span class="str">&quot;&quot;</span>);
        call.set_InputDevice(
        TCallIoDeviceType.callIoDeviceTypePort, OutPort.ToString());
    }
    <span class="kwrd">else</span> <span class="kwrd">if</span> (status == TCallStatus.clsFinished)
    {
        call = <span class="kwrd">null</span>;
        packetSize = 0;
    }
}</pre>
<p>I found an example Delphi application on the Skype developer website which shows how to intercept the microphone signal to
<a href="https://developer.skype.com/Docs/Skype4COM/Example/MicBooster_pas">boost the signal level</a>. I used this sample as the starting point for creating my own application to intercept audio samples in Skype. The Delphi application made use of an object
 called <a href="http://www.indyproject.org/docsite/html/frames.html?frmname=topic&amp;frmfile=TIdTCPServer.html">
TIdTCPServer</a>, which is a multi-threaded socket server. I created a very simple .NET implementation of this class (without the multi-threading as we will only have one connection at a time):</p>
<pre class="csharpcode"><span class="kwrd">class</span> TcpServer : IDisposable
{
    TcpListener listener;
    <span class="kwrd">public</span> <span class="kwrd">event</span> EventHandler&lt;ConnectedEventArgs&gt; Connect;
    <span class="kwrd">public</span> <span class="kwrd">event</span> EventHandler Disconnect;
    <span class="kwrd">public</span> <span class="kwrd">event</span> EventHandler&lt;DataReceivedEventArgs&gt; DataReceived;
    
    <span class="kwrd">public</span> TcpServer(<span class="kwrd">int</span> port)
    {
        listener = <span class="kwrd">new</span> TcpListener(IPAddress.Loopback, port);
        listener.Start();
        ThreadPool.QueueUserWorkItem(Listen);
    }

    <span class="kwrd">private</span> <span class="kwrd">void</span> Listen(<span class="kwrd">object</span> state)
    {
        <span class="kwrd">while</span> (<span class="kwrd">true</span>)
        {
            <span class="kwrd">using</span> (TcpClient client = listener.AcceptTcpClient())
            {
                AcceptClient(client);
            }
        }
    }

    <span class="kwrd">private</span> <span class="kwrd">void</span> AcceptClient(TcpClient client)
    {
        <span class="kwrd">using</span> (NetworkStream inStream = client.GetStream())
        {
            OnConnect(inStream);
            <span class="kwrd">while</span> (client.Connected)
            {
                <span class="kwrd">int</span> available = client.Available;
                <span class="kwrd">if</span> (available &gt; 0)
                {
                    <span class="kwrd">byte</span>[] buffer = <span class="kwrd">new</span> <span class="kwrd">byte</span>[available];
                    <span class="kwrd">int</span> read = inStream.Read(buffer, 0, available);
                    Debug.Assert(read == available);
                    OnDataReceived(buffer);
                }
                <span class="kwrd">else</span>
                {
                    Thread.Sleep(50);
                }
            }
        }
        OnDisconnect();
    }

    <span class="kwrd">private</span> <span class="kwrd">void</span> OnConnect(NetworkStream stream)
    {
        var connect = Connect;
        <span class="kwrd">if</span> (connect != <span class="kwrd">null</span>)
        {
            connect(<span class="kwrd">this</span>, <span class="kwrd">new</span> ConnectedEventArgs() { Stream = stream });
        }
    }

    <span class="kwrd">private</span> <span class="kwrd">void</span> OnDisconnect()
    {
        var disconnect = Disconnect;
        <span class="kwrd">if</span> (disconnect != <span class="kwrd">null</span>)
        {
            disconnect(<span class="kwrd">this</span>, EventArgs.Empty);
        }
    }

    <span class="kwrd">private</span> <span class="kwrd">void</span> OnDataReceived(<span class="kwrd">byte</span>[] buffer)
    {
        var execute = DataReceived;
        <span class="kwrd">if</span> (execute != <span class="kwrd">null</span>)
        {
            execute(<span class="kwrd">this</span>, <span class="kwrd">new</span> DataReceivedEventArgs() { Buffer = buffer });
        }
    }

    <span class="preproc">#region</span> IDisposable Members

    <span class="kwrd">public</span> <span class="kwrd">void</span> Dispose()
    {
        listener.Stop();
    }

    <span class="preproc">#endregion</span>
}

<span class="kwrd">public</span> <span class="kwrd">class</span> DataReceivedEventArgs : EventArgs
{
    <span class="kwrd">public</span> <span class="kwrd">byte</span>[] Buffer { get; set; }
}

<span class="kwrd">public</span> <span class="kwrd">class</span> ConnectedEventArgs : EventArgs
{
    <span class="kwrd">public</span> NetworkStream Stream { get; set; }
}</pre>
<p>Once Skype has been told the port numbers on which to connect, it will attempt to open sockets to our
<b>TcpListener</b> classes (one for audio in, and one for audio out). We now simply need to pass the audio through our effect chain. But
<b>EffectStream </b>needs a <b>WaveStream</b> derived class for its input, so I created
<b>SkypeBufferStream</b> to which we pass the raw data received on the microphone in socket, and it returns it in its
<b>Read</b> method. One difficulty I encountered was that Skype offers no way of querying what the sample rate of the incoming data is. On my PC it seems to be 44.1kHz, but I do not know if this is guaranteed on all computers.</p>
<pre class="csharpcode"><span class="kwrd">class</span> SkypeBufferStream : WaveStream
{
    <span class="kwrd">byte</span>[] latestInBuffer;
    WaveFormat waveFormat;

    <span class="kwrd">public</span> SkypeBufferStream(<span class="kwrd">int</span> sampleRate)
    {
        waveFormat = <span class="kwrd">new</span> WaveFormat(sampleRate, 16, 1);
    }

    <span class="kwrd">public</span> <span class="kwrd">override</span> WaveFormat WaveFormat
    {
        get { <span class="kwrd">return</span> waveFormat; }
    }

    <span class="kwrd">public</span> <span class="kwrd">override</span> <span class="kwrd">long</span> Length
    {
        get { <span class="kwrd">return</span> 0; }
    }

    <span class="kwrd">public</span> <span class="kwrd">override</span> <span class="kwrd">long</span> Position
    {
        get
        {
            <span class="kwrd">return</span> 0;
        }
        set
        {
            <span class="kwrd">throw</span> <span class="kwrd">new</span> NotImplementedException();
        }
    }

    <span class="kwrd">public</span> <span class="kwrd">void</span> SetLatestInBuffer(<span class="kwrd">byte</span>[] buffer)
    {
        latestInBuffer = buffer;
    }

    <span class="kwrd">public</span> <span class="kwrd">override</span> <span class="kwrd">int</span> Read(<span class="kwrd">byte</span>[] buffer, <span class="kwrd">int</span> offset, <span class="kwrd">int</span> count)
    {
        <span class="kwrd">if</span> (offset != 0)
            <span class="kwrd">throw</span> <span class="kwrd">new</span> ArgumentOutOfRangeException(<span class="str">&quot;offset&quot;</span>);
        <span class="kwrd">if</span> (buffer != latestInBuffer)
            Array.Copy(latestInBuffer, buffer, count);
        <span class="kwrd">return</span> count;
    }
}</pre>
<p>Now when we receive any data from the microphone socket, we pass it through the
<b>SkypeBufferStream</b> which in turn passes it through the <b>EffectStream</b> and finally out on the output socket's data stream. Here's the relevant code (found in the
<b>MicInterceptor</b> class):</p>
<pre class="csharpcode">NetworkStream outStream;
SkypeBufferStream bufferStream;
WaveStream outputStream;

<span class="kwrd">void</span> OnOutServerConnect(<span class="kwrd">object</span> sender, ConnectedEventArgs e)
{
    log.Info(<span class="str">&quot;OutServer Connected&quot;</span>);
    outStream = e.Stream;
}

<span class="kwrd">void</span> OnMicServerExecute(<span class="kwrd">object</span> sender, DataReceivedEventArgs args)
{
    <span class="rem">// log.Info(&quot;Got {0} bytes&quot;, args.Buffer.Length);</span>
    <span class="kwrd">if</span> (outStream != <span class="kwrd">null</span>)
    {
        <span class="rem">// give the input audio to the beginning of our audio graph</span>
        bufferStream.SetLatestInBuffer(args.Buffer);
        <span class="rem">// process it out through the effects</span>
        outputStream.Read(args.Buffer, 0, args.Buffer.Length);
        <span class="rem">// play it back</span>
        outStream.Write(args.Buffer, 0, args.Buffer.Length);
    }
}</pre>
<p>When you run your application for the first time, you will need to grant it permission from within Skype:</p>
<p><img title="image" border="0" alt="image" src="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_3.png" width="203" height="133">
</p>
<p><img title="image" border="0" alt="image" src="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_4.png" width="500" height="371">
</p>
<p>To test that the effects are working in Skype is a little tricky as you will not hear the effected sound on your end of the conversation. One good way of checking the effects are working as expected is to use the Skype test call service. This is a number
 you can dial and it will record what you say and play it back to you. </p>
<p><a href="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_5.png"><img title="image" border="0" alt="image" src="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_thumb.png" width="240" height="20"></a>
</p>
<p>It is a good idea to test your effect first using a local audio file, as you will not easily be able to determine whether glitches and other audio artifacts were caused by your effect, or simply due to poor network conditions.</p>
<p>There are a few things you should be aware of when selecting effects for use with Skype. First, the audio is mono, so there is no point using any effects such as stereo delay. Second, the audio is almost certainly down-sampled to a much lower sample rate
 before being transmitted to save on network bandwidth. This means any high frequency components of your sound will be lost. Third, internet telephony applications often have built in echo-suppression, so using delay-based effects might not work quite as well
 as you were hoping. </p>
<p>For silly voice effects, the most effective is pitch shifting (try the <b>SuperPitch</b> effect and shift either up or down about five semitones).
<b>FlangeBaby</b> or <b>Chorus</b> can be used for more subtle voice changing effects. Or if you just want to be annoying, load up a
<b>Delay</b>. Feel free to experiment with the other included effects, but bear in mind that many of them are designed with more musical uses in mind, so may not be relevant for internet telephony.</p>
<h3>The Sample Code</h3>
<p>The source code for all the effects, the <b>EffectStream</b> and the Skype connection code described in this article is available in the provided download. It uses a recent unreleased build of NAudio, so you will also need to visit the NAudio CodePlex site
 if you want to get access to the full source code. The <b>EffectStream</b> and <b>
Effect</b> classes will eventually be made part of the NAudio framework, once I have refined their design a bit.</p>
<p><a href="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_6.png"><img title="image" border="0" alt="image" src="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_thumb_3.png" width="500" height="227"></a>
</p>
<p><b></b></p>
<p>How to use Effect Tester sample app:
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td valign="top" width="57"><a href="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_7.png"><img title="image" border="0" alt="image" src="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_thumb_4.png" width="45" height="41"></a>
</td>
<td valign="top" width="559">
<p>Click this icon to attach to Skype and await a call. Click it again to disconnect, allowing you to test your effects using audio files instead. The textbox in the top right corner keeps you updated with the status of the connection to Skype</p>
</td>
</tr>
<tr>
<td valign="top" width="57"><a href="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_8.png"><img title="image" border="0" alt="image" src="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_thumb_5.png" width="39" height="43"></a>
</td>
<td valign="top" width="559">
<p>Click this icon to load a WAV or MP3 file for playback. Files at 44.1kHz work best.</p>
</td>
</tr>
<tr>
<td valign="top" width="57"><a href="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_9.png"><img title="image" border="0" alt="image" src="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_thumb_6.png" width="147" height="40"></a>
</td>
<td valign="top" width="559">
<p>Rewind, Play, Pause or Stop the current WAV or MP3 file.</p>
</td>
</tr>
<tr>
<td valign="top" width="57"><a href="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_10.png"><img title="image" border="0" alt="image" src="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_thumb_7.png" width="38" height="42"></a>
</td>
<td valign="top" width="559">
<p>Brings up the effect selector dialog to add a new instance of an effect to the current Effect chain. Loaded effects appear in the CheckedListBox on the left.</p>
</td>
</tr>
<tr>
<td valign="top" width="57"><a href="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_11.png"><img title="image" border="0" alt="image" src="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_thumb_8.png" width="42" height="41"></a>
</td>
<td valign="top" width="559">
<p>Removes the currently selected effect from the effect chain</p>
</td>
</tr>
<tr>
<td valign="top" width="57"><a href="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_12.png"><img title="image" border="0" alt="image" src="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_thumb_9.png" width="74" height="40"></a>
</td>
<td valign="top" width="559">
<p>Move the currently selected effects up or down in the signal chain</p>
</td>
</tr>
<tr>
<td valign="top" width="57"><a href="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_13.png"><img title="image" border="0" alt="image" src="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_thumb_10.png" width="96" height="42"></a>
</td>
<td valign="top" width="559">
<p>Use the checkboxes to enable or disable effects in the effect chain on the fly.</p>
</td>
</tr>
<tr>
<td valign="top" width="57"><a href="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_14.png"><img title="image" border="0" alt="image" src="http://ecn.channel9.msdn.com/o9/c4fcontent/migration/9391048/image_thumb_11.png" width="228" height="44"></a>
</td>
<td valign="top" width="559">
<p>Use the sliders to adjust the effect parameters in real-time</p>
</td>
</tr>
</tbody>
</table>
</p>
<p><b></b></p>
<h3>About the Author</h3>
<p>Mark<b> </b>Heath is a .NET developer based in Southampton, UK. When he's not writing .NET audio applications for fun, he enjoys home studio recording, playing football, reading theology books and sword fighting with his four small children. His development
 blog can be found at <a href="http://mark-dot-net.blogspot.com">http://mark-dot-net.blogspot.com</a></p>
 <img src="http://m.webtrends.com/dcs1wotjh10000w0irc493s0e_6x1g/njs.gif?dcssip=channel9.msdn.com&dcsuri=http://channel9.msdn.com/Niners/markheath/Posts/RSS&WT.dl=0&WT.entryid=Entry:RSSView:b7d8e2fa4703498d85ec9e7600cd487a">]]></description>
      <comments>http://channel9.msdn.com/coding4fun/articles/Skype-Voice-Changer</comments>
      <itunes:summary>
In this article I demonstrate how you can create your own audio effects in .NET to manipulate digital audio at the sample level. These effects are used to process MP3 files while they are being played back, and to process the real-time microphone input allowing
 you to change your voice during a Skype conversation. 




Mark Heath, blog 
 


Source Code: Download 
Difficulty: Intermediate 
Time Required: 8 hours 
Cost: Free 
Software Needed: Visual Basic or Visual C# Express 2008

Libraries: NAudio, 
Skype4COM, MEF 






Audio and the .NET Framework
Playing back audio in a .NET application is not quite as easy as you might hope it would be. The .NET 2.0 Framework introduced the
SoundPlayer component, which allows you to play back an existing WAV file. While this may be fine for many scenarios, as soon as you want to do things even slightly more advanced,
 such as changing the volume, or playing back from a different file-format, or pausing and repositioning, you must resort to writing P/Invoke wrappers for various Windows APIs. 
Back in 2002, as I was getting started learning .NET, I create some audio-related classes of my own to compensate for the lack of audio support in the .NET Framework. I focused initially on reading and writing WAV and MIDI files, as well as playing back
 audio in a way that allowed real-time mixing and manipulation of the audio at a sample level. As time went by, I made this growing collection of audio classes available as an open source project, called
NAudio, now hosted at CodePlex. 
Audio Playback in NAudio
NAudio works by constructing an audio playback graph. Audio comes in “streams” which can be connected together and modified before eventually they go to a renderer. This might be your soundcard if you are listening to the audio, or it might be to a file
 on the hard disk. 
In NAudio, all streams derive from WaveStream. NAudio comes with a collection of useful
WaveStream derived classes such as WaveFileReader to read from WAV files or</itunes:summary>
      <link>http://channel9.msdn.com/coding4fun/articles/Skype-Voice-Changer</link>
      <pubDate>Mon, 02 Feb 2009 15:19:20 GMT</pubDate>
      <guid isPermaLink="false">http://channel9.msdn.com/coding4fun/articles/Skype-Voice-Changer</guid>
      <media:thumbnail url="http://ecn.channel9.msdn.com/o9/c4f/images/9391048_100.jpg" height="75" width="100"></media:thumbnail>
      <media:thumbnail url="http://ecn.channel9.msdn.com/o9/c4f/images/9391048_220.jpg" height="165" width="220"></media:thumbnail>      
      <dc:creator>Mark Heath</dc:creator>
      <itunes:author>Mark Heath</itunes:author>
      <slash:comments>51</slash:comments>
      <wfw:commentRss>http://channel9.msdn.com/coding4fun/articles/Skype-Voice-Changer/RSS</wfw:commentRss>
      <category>Audio</category>
      <category>Mash Up</category>
    </item>    
</channel>
</rss>