The Kinect Depth and Video Space


Today's project is another chapter in the free Kinect web book, Practical Windows Kinect In C#. hosted by I Programmer and takes us into the Video and Depth features of the Kinect and Kinect for Windows SDK

Kinect SDK 1 - Depth and Video Space

In previous chapters we learned how to read in video and depth data. Now we take a break from looking at how to extract data from the cameras and concentrate on how to relate data from the depth camera to the video camera.

The Kinect has two cameras, one video and one depth. In both cases they return a bitmap corresponding to what they see. In the case of the video camera the bitmap is usually 640x480 and for the depth map it is often 320x240, i.e. exactly half the resolution.

A very standard requirement is to pick out video pixels that correspond to particular depth pixels. This sounds fairly easy as you might think that the pixel at x,y in the depth image corresponds to the four pixels at 2x,2y in the video image.

Unfortunately this simple mapping doesn't work and this can be the cause of much wasted time in trying figure out why your perfectly reasonable program doesn't quite do what you expect. Fortunately, the solution is fairly easy - once you know how.

As a simple example of using the data from the depth camera with the video camera we construct a demo of how to use the player index (see chapter 4) to create masks that can be used to remove the background. from a user's image.


Converting from depth to video

To convert from depth to video coordinates is simply a matter in projective geometry. What we have are two perspective views of the same scene and so it is perfect possible to implement a function which converts between them - possible but not easy to get right.

For this reason the DepthImageFrame object has a set of coordinate conversion methods. In this case we need the MapToColorImagePoint method. This takes the depth image coordinates and takes into account the depth at the point and converts to the location in the video image. All you have to tell the method, in addition to the depth co-ordinates is the format of the video image.

(Note: you don't have to supply the depth of the pixel to the methods as in the early beta SDK.)

Next we need to compute the video co-ordinates vx,vy from the depth co-ordinates x,y:


Project Information URL:

Project Source URL: (Registration required)

void FramesReady(object sender, AllFramesReadyEventArgs e)

DepthImageFrame DFrame = e.OpenDepthImageFrame();

if (DFrame == null) return;

short[] depthimage = new short[DFrame.PixelDataLength];


ColorImageFrame VFrame = e.OpenColorImageFrame();

if (VFrame == null) return;

byte[] pixeldata = new byte[VFrame.PixelDataLength];


byte player;
int vx, vy;

for (int y = 0; y < DFrame.Height; y++)
for (int x = 0; x < DFrame.Width; x++)
player = (byte)(depthimage[
x + y * DFrame.Width] & DepthImageFrame.PlayerIndexBitmask);
if (player != 0) player = 0xFF;

vx = x * 2;
vy = y * 2;
ColorImagePoint p = DFrame.MapToColorImagePoint(
x, y, 
vx = p.X;
vy = p.Y;

vx = Math.Max(0, Math.Min(vx, VFrame.Width - 2));
vy = Math.Max(0, Math.Min(vy, VFrame.Height - 2));

for (int k = 0; k < 8; k++)
pixeldata[(vx + vy * VFrame.Width) * VFrame.BytesPerPixel + k] &= player;

pixeldata[(vx + (vy + 1) * VFrame.Width) * VFrame.BytesPerPixel + k] &= player;
pictureBox1.Image = ByteToBitmap(pixeldata, VFrame.Width, VFrame.Height);


Contact Information:

The Discussion

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to send us feedback you can Contact Us.