Content Obsolete

This content is no longer current. Our recommendation for up to date content:

Audio Fundamentals (Beta 2 SDK)

Download this episode

Download Video


Update: Kinect for Window SDK v1 Quickstart Series now Available (Feb 1st)

Please use the newly updated Kinect for Windows SDK Quickstart series. The content below will only work with the Beta 2 version of the Kinect for Windows SDK.


This video covers the basics of reading audio data from the Kinect microphone array, a demo adapted from the built in audio recorder. The video also covers speech recognition using Kinect.  You may find it easier to follow along by downloading the Kinect for Windows SDK Quickstarts samples and slides that have been updated for Beta 2 (Nov, 2011).

  • [00:35] Kinect microphone information
  • [01:10] Audio data
  • [02:15] Speech recognition information
  • [05:08] Recording audio
  • [08:17] Speech recognition demo


Updates for Kinect for Windows SDK Beta 2 (Nov, 2011)

The video has not been updated for Beta 2, but the following changes have been made:

  • Beta 2 now enables you to record audio on a Single-Threaded Apartment (STA) thread, the default thread that is used for WPF applications. Previously, you had to create a new thread marked as a Multi-Threaded Apartment (MTA) for audio processing to work.
  • Beta 2 includes a new WPF audio example, KinectAudioDemo, that demonstrates speech recognition and calculating the angle of the current sound source.


The steps below assume you have setup your development environment as explained in the "Setting Up Your Development Environment" video.

Task: Designing Your UI

We’ll add in a Slider and two Button controls, and we'll also use some stack panels to be sure everything lines up nicely:


<Window x:Class="AudioRecorder.MainWindow"
        Title="Audio Recorder Sample" Height="159" Width="525">
            <StackPanel Orientation="Horizontal">
                <Label Content="Seconds to Record: " />
                <Label Content="{Binding ElementName=RecordForTimeSpan, Path=Value}" />
            <Slider Name="RecordForTimeSpan" Minimum="1"  Maximum="25" IsSnapToTickEnabled="True" />
            <StackPanel Orientation="Horizontal" HorizontalAlignment="Center">
                <Button Content="Record" Height="50" Width="100" Name="RecordButton" />
                <Button Content="Play" Height="50" Width="100" Name="PlayButton" />
            <MediaElement Name="audioPlayer" />


Creating Click events

For each button, we'll want to create a click event. Go to the properties window (F4), select the RecordButton, select the Events tab, and double click on the Click event to create the RecordButton_Click event. Do the same for the Play Button so we have the PlayButton_Click event wired up as well


Task: Working with the KinectAudioSource

The first task is to add in the Kinect Audio library:


using Microsoft.Research.Kinect.Audio;

Visual Basic

Imports Microsoft.Research.Kinect.Audio

Synchronous and asynchronous recording

There are two ways we can record audio. You can record audio synchronously, meaning that the UI thread will in effect be “frozen” while we record audio using it. Alternatively, you can record audio on a separate thread so that the UI thread remains responsive to events while the recording happens in parallel.  Our sample includes both methods so you can choose which one is required for your application.


We’ll build variables to hold the amount of time we’ll record, the file name of the recording, and to enable asynchronous recording, we’ll use the FinishedRecording event to notify the UI thread that we're done recording:


double _amountOfTimeToRecord;
string _lastRecordedFileName;
private event RoutedEventHandler FinishedRecording;

Visual Basic

Private _amountOfTimeToRecord As Double
Private _lastRecordedFileName As String
Private Event FinishedRecording As RoutedEventHandler


Next we’ll create the RecordAudio method that will do the actual audio recording.


private void RecordAudio()

Visual Basic

Private Sub RecordAudio()
End Sub

To create threads, we'll add in the System.Threading namespace:


using System.Threading;

Visual Basic

Imports System.Threading

Now we'll create the thread and do some simple end-user management in the RecordButton_Click event. First we'll disable the two buttons, record the audio, and create a unique file name.


Then we have the option of calling the RecordAudio method either synchronously or asynchronously as shown below:


private void RecordButton_Click(object sender, RoutedEventArgs e)
    RecordButton.IsEnabled = false;
    PlayButton.IsEnabled = false;
    _amountOfTimeToRecord = RecordForTimeSpan.Value; 
    _lastRecordedFileName = DateTime.Now.ToString("yyyyMMddHHmmss") + "_wav.wav";
    var t = new Thread(new ThreadStart(RecordAudio));

Visual Basic

Private Sub RecordButton_Click(ByVal sender As Object, ByVal e As RoutedEventArgs)

    RecordButton.IsEnabled = False
    PlayButton.IsEnabled = False
    _amountOfTimeToRecord = RecordForTimeSpan.Value
    _lastRecordedFileName = Date.Now.ToString("yyyyMMddHHmmss") & "_wav.wav"

    Dim t = New Thread(New ThreadStart(AddressOf RecordAudio))

End Sub

Task: Capturing Audio Data

From here, this sample and the built-in sample are pretty much the same. We'll only add three differences: the FinishedRecording event, a dynamic playback time, and the dynamic file name. Note that the WriteWavHeader function is the exact same as the one in the built-in demo as well. Since we leverage different types of streams, we'll add the System.IO namespace:


using System.IO;

Visual Basic

Imports System.IO

The entire RecordAudio method:


private void RecordAudio()
    using (var source = new KinectAudioSource())
        var recordingLength = (int) _amountOfTimeToRecord * 2 * 16000;
        var buffer = new byte[1024];
        source.SystemMode = SystemMode.OptibeamArrayOnly;
        using (var fileStream = new FileStream(_lastRecordedFileName, FileMode.Create))
            WriteWavHeader(fileStream, recordingLength);

            //Start capturing audio                               
            using (var audioStream = source.Start())
                //Simply copy the data from the stream down to the file
                int count, totalCount = 0;
                while ((count = audioStream.Read(buffer, 0, buffer.Length)) > 0 && totalCount < recordingLength)
                    fileStream.Write(buffer, 0, count);
                    totalCount += count;

        if (FinishedRecording != null)
            FinishedRecording(null, null);

Visual Basic

Private Sub RecordAudio()
    Using source = New KinectAudioSource

        Dim recordingLength = CInt(Fix(_amountOfTimeToRecord)) * 2 * 16000
        Dim buffer = New Byte(1023) {}

        source.SystemMode = SystemMode.OptibeamArrayOnly

        Using fileStream = New FileStream(_lastRecordedFileName, FileMode.Create)

            WriteWavHeader(fileStream, recordingLength)

            'Start capturing audio                               
            Using audioStream = source.Start()

                'Simply copy the data from the stream down to the file
                Dim count As Integer, totalCount As Integer = 0
                count = audioStream.Read(buffer, 0, buffer.Length)
                Do While count > 0 AndAlso totalCount < recordingLength

                    fileStream.Write(buffer, 0, count)
                    totalCount += count

                    count = audioStream.Read(buffer, 0, buffer.Length)

            End Using

        End Using

        RaiseEvent FinishedRecording(Nothing, Nothing)

    End Using

End Sub

Task: Playing Back the Audio We Just Captured

So we've recorded the audio, saved it, and fired off an event that said we're done—let's hook into it. We'll wire up that event in the MainWindow constructor:


public MainWindow()

    FinishedRecording += new RoutedEventHandler(MainWindow_FinishedRecording);

Visual Basic

Public Sub New()

    AddHandler FinishedRecording, AddressOf MainWindow_FinishedRecording
End Sub

Since that event will return on a non-UI thread, we'll need to use the Dispatcher to get us back on a UI thread so we can reenable those buttons:


void MainWindow_FinishedRecording(object sender, RoutedEventArgs e)
    Dispatcher.BeginInvoke(new ThreadStart(ReenableButtons));

private void ReenableButtons()
    RecordButton.IsEnabled = true;
    PlayButton.IsEnabled = true;

Visual Basic

Private Sub MainWindow_FinishedRecording(sender As Object, e As RoutedEventArgs)
    Dispatcher.BeginInvoke(New ThreadStart(ReenableButtons))
End Sub

Private Sub ReenableButtons()
    RecordButton.IsEnabled = True
    PlayButton.IsEnabled = True
End Sub

And finally, we'll make the Media element play back the audio we just saved!  We'll also verify both that the file exists and that the user recorded some audio:


private void PlayButton_Click(object sender, RoutedEventArgs e)
    if (!string.IsNullOrEmpty(_lastRecordedFileName) && File.Exists(_lastRecordedFileName))
        audioPlayer.Source = new Uri(_lastRecordedFileName, UriKind.RelativeOrAbsolute);
        audioPlayer.LoadedBehavior = MediaState.Play;
        audioPlayer.UnloadedBehavior = MediaState.Close;

Visual Basic

Private Sub PlayButton_Click(sender As Object, e As RoutedEventArgs)

    If (Not String.IsNullOrEmpty(_lastRecordedFileName)) AndAlso File.Exists(_lastRecordedFileName) Then

        audioPlayer.Source = New Uri(_lastRecordedFileName, UriKind.RelativeOrAbsolute)
        audioPlayer.LoadedBehavior = MediaState.Play
        audioPlayer.UnloadedBehavior = MediaState.Close

    End If

End Sub

Task: Speech Recognition

To do speech recognition, we need to bring in the speech recognition namespaces from the speech SDK:


using Microsoft.Speech.AudioFormat;
using Microsoft.Speech.Recognition;

Visual Basic

Imports Microsoft.Speech.AudioFormat
Imports Microsoft.Speech.Recognition

In VB we'll also need to add in a MTA flag as well under the Sub Main.  C# does not need this.

Visual Basic

<MTAThread()> _
Shared Sub Main(ByVal args() As String)

Next, we need to setup the KinectAudioSource in a way that's compatbile for speech recognition:


using (var source = new KinectAudioSource())
    source.FeatureMode = true;
    source.AutomaticGainControl = false; //Important to turn this off for speech recognition
    source.SystemMode = SystemMode.OptibeamArrayOnly; //No AEC for this sample

Visual Basic

Using source = New KinectAudioSource

source.FeatureMode = True
source.AutomaticGainControl = False 'Important to turn this off for speech recognition
source.SystemMode = SystemMode.OptibeamArrayOnly 'No AEC for this sample

End Using

With that in place, we can initialize the SpeechRecognitionEngine to use the Kinect recognizer, which was downloaded earlier:


private const string RecognizerId = "SR_MS_en-US_Kinect_10.0";
RecognizerInfo ri = SpeechRecognitionEngine.InstalledRecognizers().Where(r => r.Id == RecognizerId).FirstOrDefault();

Visual Basic

Private Const RecognizerId As String = "SR_MS_en-US_Kinect_10.0"
Dim ri As RecognizerInfo = SpeechRecognitionEngine.InstalledRecognizers().Where(Function(r) r.Id = RecognizerId).FirstOrDefault()

Next, a "grammar" needs to be setup, which specifies which words the speech recognition engine should listen for.  The following code creates a grammar for the words "red", "blue" and "green".


using (var sre = new SpeechRecognitionEngine(ri.Id))
    var colors = new Choices();
    var gb = new GrammarBuilder();
    //Specify the culture to match the recognizer in case we are running in a different culture.                                 
    gb.Culture = ri.Culture;
    // Create the actual Grammar instance, and then load it into the speech recognizer.
    var g = new Grammar(gb);                  

Visual Basic

Using sre = New SpeechRecognitionEngine(ri.Id)

Dim colors = New Choices

Dim gb = New GrammarBuilder
'Specify the culture to match the recognizer in case we are running in a different culture
gb.Culture = ri.Culture

' Create the actual Grammar instance, and then load it into the speech recognizer.
Dim g = New Grammar(gb)


End Using

Next, several events are hooked up so you can be notified when a word is recognized, hypothesized, or rejected:


sre.SpeechRecognized += SreSpeechRecognized;
sre.SpeechHypothesized += SreSpeechHypothesized;
sre.SpeechRecognitionRejected += SreSpeechRecognitionRejected;

Visual Basic

AddHandler sre.SpeechRecognized, AddressOf SreSpeechRecognized
AddHandler sre.SpeechHypothesized, AddressOf SreSpeechHypothesized
AddHandler sre.SpeechRecognitionRejected, AddressOf SreSpeechRecognitionRejected

Finally, the audio stream source from the Kinect is applied to the speech recognition engine:


using (Stream s = source.Start())
                              new SpeechAudioFormatInfo(
                                  EncodingFormat.Pcm, 16000, 16, 1,
                                  32000, 2, null));
    Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop");
    Console.WriteLine("Stopping recognizer ...");

Visual Basic

Using s As Stream = source.Start()

sre.SetInputToAudioStream(s, New SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, Nothing))

Console.WriteLine("Recognizing. Say: 'red', 'green' or 'blue'. Press ENTER to stop")

Console.WriteLine("Stopping recognizer ...")

End Using

The event handlers specified earlier display information based on the result of the user's speech being recognized:


static void SreSpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
    Console.WriteLine("\nSpeech Rejected");
    if (e.Result != null)

static void SreSpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
    Console.Write("\rSpeech Hypothesized: \t{0}\tConf:\t{1}", e.Result.Text, e.Result.Confidence);

static void SreSpeechRecognized(object sender, SpeechRecognizedEventArgs e)
    Console.WriteLine("\nSpeech Recognized: \t{0}", e.Result.Text);

private static void DumpRecordedAudio(RecognizedAudio audio)
    if (audio == null)

    int fileId = 0;
    string filename;
    while (File.Exists((filename = "RetainedAudio_" + fileId + ".wav")))

    Console.WriteLine("\nWriting file: {0}", filename);
    using (var file = new FileStream(filename, System.IO.FileMode.CreateNew))

Visual Basic

Private Shared Sub SreSpeechRecognitionRejected(ByVal sender As Object, ByVal e As SpeechRecognitionRejectedEventArgs)

     Console.WriteLine(vbLf & "Speech Rejected")
     If e.Result IsNot Nothing Then
     End If

End Sub
Private Shared Sub SreSpeechHypothesized(ByVal sender As Object, ByVal e As SpeechHypothesizedEventArgs)

     Console.Write(vbCr & "Speech Hypothesized: " & vbTab & "{0}" & vbTab & "Conf:" & vbTab & "{1}", e.Result.Text, e.Result.Confidence)

End Sub
Private Shared Sub SreSpeechRecognized(ByVal sender As Object, ByVal e As SpeechRecognizedEventArgs)

     Console.WriteLine(vbLf & "Speech Recognized: " & vbTab & "{0}", e.Result.Text)

End Sub

Private Shared Sub DumpRecordedAudio(ByVal audio As RecognizedAudio)
     If audio Is Nothing Then
     End If

     Dim fileId As Integer = 0
     Dim filename As String
     filename = "RetainedAudio_" & fileId & ".wav"
     Do While File.Exists(filename)
          fileId += 1
          filename = "RetainedAudio_" & fileId & ".wav"

     Console.WriteLine(vbLf & "Writing file: {0}", filename)
     Using file = New FileStream(filename, System.IO.FileMode.CreateNew)
     End Using

End Sub

In the case of a word being rejected, the audio is written out to a WAV file so it can be listened to later.


We've created an application that can record audio for a variable amount of time with Kinect!



Available formats for this video:

Actual format may change based on video formats available and browser capability.

    The Discussion

    • User profile image

      This is awesome!!

    • User profile image


      But how about other languages? Like German, French oder Spanish?

      Are these supported?

    • User profile image

      How about a brief code sample of how general dictation might be used? When I try to modify the sample code to add


      It crashes on


      I've searched high and low for a solution but it appears this is a general issue (that the dictation stuff doesn't work) with the speech API so why is it there? Any help is much appreciated.



    • User profile image
      George Birbilis

      typo: visaul -> visual

    • User profile image

      @George Birbilis: fixed the typo

    • User profile image

      @TheZar: are you using the x86 or x64 speech APIs?

    • User profile image

      Very good the tutorial but, do you have the code for Speech Recognition? Thanks

    • User profile image

      Is there any good resource out there for learning the SRGS XML format? The W3C specification is too.. specificationy, and all the tutorials I've found so far deal with the BNF format rather than the XML format.

    • User profile image

      Hi, thanks for sharing us such a good tutorial. But I personally find it is not so difficult to record streaming audio from microphone by standalone audio recorders, not built-in ones.

    • User profile image
      Hiva Javaher

      I'm trying to get both speech recognition and Text to speech to work on a WPF app (C#)
      I have the Recognition down but the synthesizer part keeps giving an error of "No voice installed on the system or none available with the current security setting."
      I have both "Microsoft Speech Platform - Software Development Kit (SDK) (Version 10.2)" and "Microsoft Speech Platform - Server Runtime (Version 10.2)" in X86 and X64 installed on my system.

      Can anyone tell me whats wrong? I would really really appreciate it.


    • User profile image

      I am trying to add speech recognition to a WPF C# app. I am receiving video, skeletal, and depth data correctly, but whenever I start capturing the audio I receive the exception error bellow. I can run the demo above correctly. Is there a reference or an extra step needed when using WPF.


      System.InvalidCastException was unhandled
        Message=Unable to cast COM object of type 'System.__ComObject' to interface type 'Microsoft.Research.Kinect.Audio.IMediaObject'. This operation failed because the QueryInterface call on the COM component for the interface with IID '{D8AD0F58-5494-4102-97C5-EC798E59BCF4}' failed due to the following error: No such interface supported (Exception from HRESULT: 0x80004002 (E_NOINTERFACE)).
             at System.StubHelpers.StubHelpers.GetCOMIPFromRCW(Object objSrc, IntPtr pCPCMD, Boolean& pfNeedsRelease)
             at Microsoft.Research.Kinect.Audio.IMediaObject.ProcessOutput(Int32 dwFlags, Int32 cOutputBufferCount, DMO_OUTPUT_DATA_BUFFER[] pOutputBuffers, Int32& pdwStatus)
             at Microsoft.Research.Kinect.Audio.KinectAudioStream.RunCapture(Object notused)
             at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
             at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
             at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
             at System.Threading.ThreadHelper.ThreadStart(Object obj)

    • User profile image
      I wanted to create an application where the user spoke a determined word (e.g. KinectVoice), after this word the user would have 5 seconds to say the command. In other words, i need to create a threat that runs allways. The problem is that we will need to delete what has been said of X in X time (minutes, seconds), if not, the application gets a lot of load! Someone can tell what is the best way to accomplish this?
    • User profile image

      For some reason i only have the Microsoft Lightweight Speech Recognizer v11.0 (SR_MS_ZXX_Lightweight_v11.0) showing up as an available speech recognizer. I've double-checked that i have everything installed correctly, and i'm referencing the C:\Program Files\Microsoft SDKs\Speech\v11.0\Assembly\Microsoft.Speech.dll. Any ideas why i don't see the Kinect Recognizer?

    Comments closed

    Comments have been closed since this content was published more than 30 days ago, but if you'd like to send us feedback you can Contact Us.