Getting going with the Kinect for Windows


Today's inspirational post and project shows that developing for the Kinect with Kinect for Windows SDK doesn't have to be all about the NUI. But more importantly there's nothing to be afraid of, that it's really pretty easy to get started and writing code...

Tapping Into the Power of Kinect for Windows

Hello everyone, my name is Gavin Gear, and I am going to be blogging regularly here on the Extreme Windows Blog. My background includes working on the Windows sensor team (see recent talks from BUILD HERE), and before Microsoft I graduated with a degree in Mechanical Engineering. Some of the things I enjoy include studio photography, video production, building PCs, writing apps, and inventing/fabricating/fixing. I look forward to bringing you stories where Windows powers extreme experiences, and to start I thought I’d do a quick Kinect for Windows project. Here we go!

What is Kinect for Windows?

We’ve all seen Kinect: the amazing game input device for Xbox 360 that enables controller-less console game play and control of other Xbox experiences. It’s fascinating to discover how this device uses array microphones, a projected IR dot pattern, an IR camera, and a regular RGB camera to sense the surrounding environment. With these inputs, the Kinect sensor can isolate and record sounds, generate a room depth-map, and build 3D models of human faces and skeletons. It’s safe to say that Kinect is a game-changer.

Kinect for Windows is all about enabling Windows PCs to take advantage of Kinect. The Kinect for Windows 1.5 SDK and Toolkit was released in May 2012, and includes Windows drivers for the Kinect sensor, a full SDK (Software Development Kit) a toolkit (sample code, tools), and supporting documentation.

Here’s a list of what you need to start writing Kinect apps:

  • Kinect sensor*
  • Windows PC with Windows 7 or later Windows OS
  • Visual Studio (Express or full)
  • Kinect for Windows downloads (SDK and Toolkit)

*There are actually two Kinect sensors that work with the Kinect for Windows SDK and Toolkit:

Kinect for Xbox 360 sensor (says “XBOX 360” on the front) – these devices are licensed for development purposes only, and do not support “Near Mode”. This is what I used for this blog post.

Kinect for Windows sensor (says “KINECT” on the front) – these devices are optimized for Windows experiences, are licensed for use with Windows PCs, and also support “Near Mode”.

Developers can use either sensor to get started, but deployments need to be on a Kinect for Windows sensor which can be purchased online here. If you get serious about experimenting with and integrating Kinect experiences into your projects (and I hope you do), I recommend you pick up the Kinect for Windows sensor so you have access to all of the development possibilities enabled with this pc-optimized device. However, if all you’ve got is a Kinect for Xbox kicking around your living room, break it out and give it a go!


App Challenge: Kinect as Human Presence Sensor

I decided to give myself a challenge: In less than a day, I would attempt to write a functioning Kinect human presence detection app using some C# code that I had previously written. This existing code did not incorporate Kinect, and I had no experience with the Kinect SDK. This would be a good challenge because in addition to writing the app, I would also need to document the entire process on video using multiple cameras. Sounds like fun to me!

I started the day with the Kinect 1.5 SDK and Toolkit installed on my Windows 7 box, and a Kinect sensor that was still in its box. I had briefly talked to the Kinect for Windows team to get ideas for how to sense human presence with Kinect. To paraphrase, they told me to “take a look at face tracking, skeletal tracking, and depth”. I had no Kinect code at this point (other than SDK samples), and no links to docs or references.


The logic I wrote to control Windows experiences is pretty simple, and is based on the number of tracked skeletons at any given point in time (humans in view of the Kinect sensor). When the number of detected humans changes, the code I wrote does the following:

Zero people: Pause media (if playing), lock workstation
One person: Play media (if not playing)
More than one person: Pause media (if playing)

The simple app UI shows a numeric display of the number of tracked skeletons (present humans), and has checkboxes to allow supported features to be toggled on/off.

Project Information URL:



Contact Information: