Quick introduction to Kinect's Speech Recognition (in Polish)

Sign in to queue


You've got to love the universal nature of code. While this sample's text is Polish, the code is easily readable and understandable. Since Speech Recognition isn't something that's been covered here much, that this is one of those simple easy to follow examples, and given the power of machine translation (and the universality of code), I thought this would make a good quick Friday entry.

Kinect SDK - Audio API

[Machine Translated]

An application developed for the purpose of this article, has implemented functionality, speech recognition. After the Declaration of the container, which will serve as a glossary for the three words (red, green, blue) and run the application, each sound, as recorded by the sensor Kinect, shall undergo a process that will take in order to identify the received signal, and then respond to a specific word. The Program also shows how accurate the sound has been recognized and to which the master dictionary was matched.

Below is a description of the various portions of the program, which confer a decisive influence on its functioning.

This code allows you to use the function responsible for processing the audio. Then just set up an object and its własnościom, it is necessary to give appropriate size which may activate the processing of sound and Surinam device in the appropriate mode:


In the next step, you must define which language the application will use for speech recognition (at the time of writing the article was available to English):


Then we turn on the speech recognition engine, based on the language defined earlier:


Then you must define which words will be considered by the program in the process of identifying patterns that are defined in the language selected above. To do this, you must unsubscribe in the array, the individual words that speech recognition system will have a choice in the process of comparing patterns:


The next step is to build a grammar for the so-defined dictionary, whose patterns can be found in the library-defined patterns. As values of attributes, it should also give, what language you want the system to use:


Beneath the remains only implement a process that should be running at the moment when the sound signal is captured by microphones. At this point, the following declaration to download data stream and specify its characteristics:


Project Information URL: https://code.msdn.microsoft.com/Kinect-SDK-Audio-API-c70256ff

Project Download URL: https://code.msdn.microsoft.com/Kinect-SDK-Audio-API-c70256ff




The Discussion

Add Your 2 Cents