So you want to access the Kinect's cameras and ImageStreams...


We all know that the Kinect has two cameras, but do we all know how to access them? To get at the Image Streams provided by them? After reading today's post and looking at the project you'll be one one step closer, that's for sure...

Getting most of Kinect SDK in C# - part 2 of ?: ImageStreams

This is part two of series documenting my experiments with Kinect for Windows SDK.
After first two or three articles this series should make quite good walkthrough for beginners and nice reference for more advanced developers.

Part one described initialization of Kinect SDK including parameters for image capturing engine. Please refer to part one for details about initialization and general background.

Series table of contents:

  1. Initialization
  2. ImageStreams
  3. Coming soon...


Kinect device has two cameras:

  • Video camera - RGB camera for capturing images
  • Depth camera - infrared camera used to capture depth data

This article will focus on getting and processing data acquired by cameras.


What are ImageStreams?

ImageStream is a class provided by KinectSDK for accessing data captured by Kinect cameras. Each Kinect Runtime has two streams:

  • VideoStream - has to be opened with ImageStreamType.Video and ImageType.Color, ImageType.ColorYuv or ImageType.ColorYuvRaw
  • DepthStream - has to be opened with ImageStreamType.Depth and ImageType.Depth or ImageType.DepthAndPlayerIndex


Accessing image data

Using any of methods mentioned above you will get an ImageFrame, which holds image data itself in Image field and some metadata such as:

  • Type - contains type of image (ImageType) - useful in case you use same handler for both types of ImageStream
  • FrameNumber
  • Timestamp
  • Resolution



Kinect SDK provides its own class for keeping captured images. It is as simple as it can be - it holds Width, Height, BytesPerPixel, and raw data is byte[] Bits.

Video frames hold information in 32-bit XRGB or 16-bit UYVY format.

Depth frames have two different formats depending on choosing Depth or DepthAndPlayerIndex stream type:

  • 12-bit depth data (stored in two bytes with upper 4 bits unused)
  • 3-bit player index (bits 0-2) and 12-bit depth data (starting at bit 3)

Depth data of value 0 means that objects at this position are either too close or too far.

PlanarImageHelper class included in sources simplifies access to individual pixels:


How to process the images?

It depend on your needs. As you can see in my example I choose the "iterative" method, because it is very simple to write and very clear to read. On the other way it has very poor performance.

As the depth frame can be treated as gray scale image, you can achieve the same effects as in my example using filters easily found in all good image processing libraries - threshold and mask.

First you have to decide what you really need. If you are building augmented reality application then you will need high quality video and fast image blending. If you will analyze only only part of image from time to time (face recognition for example), then you still need hi-res images, but not high fps and this means you can skip processing every frame in event handler and get frames on demand.

As you can see from previous sections, Kinect SDK provides images in very raw format. This means it could be easily converted to anything you need. Most graphics libraries are able to take this raw array of bytes and create internal image representation in most efficient way.

Points of Interest

If your needs are mostly image processing with depth map aid, you should stop here and look for some image processing library.

But if you really want to get most of Kinect NUI, go to the next big thing - skeleton tracking engine.

Project Information URL:

Project Download URL: