Changing Kinect Audio Level gain to improve speech recognition
The primary theme for this week, like the last few, and probably the next few, is "v1". Projects, information, details, updates, etc related to the Kinect for Windows Sensor and SDK v1 release.
Today is some information about default audio levels and how they might not be best for voice recognition (and how to fix that in your code)
Workaround for Kinect for Windows sub-optimal audio gain setting
The first version of Kinect for Windows SDK shipped with a default microphone gain setting that is known to be sub-optimal for speech recognition scenarios. This was addressed in the version's release notes, under "Known Issues" section, issue titled "Microphone Array default gain setting is sub-optimal".
The release notes give instructions to work around issue as a user, through manual steps, but since developers probably want to do this programmatically I'm attaching sample code to this post that shows how to configure this setting automatically.
Compiling the sample will build a console application that can be called as follows:
- KinectAudioLevel 0
- KinectAudioLevel 10
- KinectAudioLevel 30
Command lines 1 and 2 will set gain on Kinect microphone to 0 dB (i.e.: no gain), which is the optimal setting for speech recognition. Command line 3 will set gain to 10dB, and command line 4 will set gain to 30dB, which is the maximum and corresponds to 100% in sound control panel.
We're working on improving the default settings in future releases, but for now speech recognition applications will get better results by explicitly setting default gain to 0dB, and applications that record audio should experiment with a gain setting that works best for them. [GD: Emphasis added]