We all know that the Kinect has two cameras, but do we all know how to access them? To get at the Image Streams provided by them? After reading today's post and looking at the project you'll be one one step closer, that's for sure...
Getting most of Kinect SDK in C# - part 2 of ?: ImageStreams
This is part two of series documenting my experiments with Kinect for Windows SDK.
After first two or three articles this series should make quite good walkthrough for beginners and nice reference for more advanced developers.
Part one described initialization of Kinect SDK including parameters for image capturing engine. Please refer to part one for details about initialization and general background.
Series table of contents:
Kinect device has two cameras:
- Video camera - RGB camera for capturing images
- Depth camera - infrared camera used to capture depth data
This article will focus on getting and processing data acquired by cameras.
What are ImageStreams?
ImageStreamis a class provided by KinectSDK for accessing data captured by Kinect cameras. Each Kinect
Runtimehas two streams:
VideoStream- has to be opened with
DepthStream- has to be opened with
Accessing image data
Using any of methods mentioned above you will get an
ImageFrame, which holds image data itself in
Imagefield and some metadata such as:
Type- contains type of image (
ImageType) - useful in case you use same handler for both types of
Kinect SDK provides its own class for keeping captured images. It is as simple as it can be - it holds
BytesPerPixel, and raw data is
Video frames hold information in 32-bit XRGB or 16-bit UYVY format.
Depth frames have two different formats depending on choosing
- 12-bit depth data (stored in two bytes with upper 4 bits unused)
- 3-bit player index (bits 0-2) and 12-bit depth data (starting at bit 3)
Depth data of value 0 means that objects at this position are either too close or too far.
PlanarImageHelperclass included in sources simplifies access to individual pixels:
How to process the images?
It depend on your needs. As you can see in my example I choose the "iterative" method, because it is very simple to write and very clear to read. On the other way it has very poor performance.
As the depth frame can be treated as gray scale image, you can achieve the same effects as in my example using filters easily found in all good image processing libraries - threshold and mask.
First you have to decide what you really need. If you are building augmented reality application then you will need high quality video and fast image blending. If you will analyze only only part of image from time to time (face recognition for example), then you still need hi-res images, but not high fps and this means you can skip processing every frame in event handler and get frames on demand.
As you can see from previous sections, Kinect SDK provides images in very raw format. This means it could be easily converted to anything you need. Most graphics libraries are able to take this raw array of bytes and create internal image representation in most efficient way.
Points of Interest
If your needs are mostly image processing with depth map aid, you should stop here and look for some image processing library.
But if you really want to get most of Kinect NUI, go to the next big thing - skeleton tracking engine.
Project Information URL: http://www.codeproject.com/KB/miscctrl/MostOfKinectSDK2.aspx
Project Download URL: http://www.codeproject.com/KB/miscctrl/MostOfKinectSDK2/JK.KinectExperiments.zip