Occlusion detection and virtual object manipulation with the Kinect


Today's inspirational project shows how one team is working with the Kinect and Kinect for Windows SDK to create an augmented reality real...

Occlusion detection and virtual object manipulation in augmented reality with the Microsoft Kinect

Augmented reality (AR) is limited by how you can interact with the virtual object displayed. Currently, the object can only be moved by moving the AR marker. In this project, we used the Microsoft Kinect sensor to allow one to virtually touch and manipulate the object displayed. In order to do this, we use the skeletal tracking of the sensor to allow the user to rotate and resize the object with his hands. While doing so, the user hands will inevitably get between the virtual object and the camera. To prevent the virtual object from being displayed wrongly over the hand, the depth map of the Kinect is used to find any obstruction and block rendering of pixels under occlusion. This is realized at the GPU level, using modified Shaders. The AR system uses the Unity3D game engine to display the 3D models with a custom plugin created at the CIMMI research center to enable the AR. The plugin computes in real time the pose of the Kinect using the color image while the depth map allow occlusion detection and hands tracking. The plugin was developed with the OpenCV library which allows easy analysis of the images. This work was realized at the CIMMI research center (cimmi.qc.ca) by two interns, Alexis Legare-Julien and Renaud Simard, under the supervision of Jean-Nicolas Ouellet, Ph.D.Eng.


Here are some additional details on the project:

It uses the Kinect color image and depth map to achieve augmented reality with occlusion and object manipulation. Without wasting time, here is the link to the demo (April 2012): http://youtu.be/OTmHtAaQD_c 

Now the description:

We work on augmented reality (AR) where a virtual object is added to a live video stream as if it was really part of the scene. Last year, we developed unity3d plugin that is able to create the AR illusion. Basically it grabs the webcam image and computes the camera position wrt. a marker. Unity then only shows the webcam image superimposed with a render of the virtual object from the correct view point. This video shows what it looks like: http://youtu.be/npuubLGpU1A 

Then we wanted to allow the user to manipulate the state of the object, rotate and resize, with his hands. The Kinect is perfect for that, the skeletal tracking measure where the user is relative to the camera (hand, head, etc). While doing so, we were confronted with the occlusion problem. That is, if the hand is closer to the camera than virtual object, the hand must be seen (we must not render the virtual object over the hand). Again, the Kinect measure the distance of everything it sees, the depth map, the z-distance. Similarly, when a video card renders a virtual scene it computes occlusion from the depth buffer, i.e. the z distance of every object in the virtual scene. The occlusion algorithm is then obvious, on the shader clip( z.real < z.virtual). 

Thus, we used the Kinect (with the Microsoft sdk, but any other sdk with skeletal tracking would do) to manage occlusion and manipulate the virtual objects interactively. Rendering is done in Unity3d free, occlusion is managed in a unity shader (this is very fast since computation is done on the graphic card), manipulation is done in Unity3d from skeletal information from the Kinect. 

The biggest problem we faced was speed, transferring the depth map to the shader was time consuming. We had to resize the depth map to 512x512 to allow faster manipulation. While doing so, we were confronted by a new problem; the occlusion clip was highlighting the object contours. So occlusion was correct but when the hand was behind the virtual object we could see through the virtual object at the object boundary. The effect was kind of cool but unwanted. 

After some investigation we found out that opencv was using bilinear interpolation to resize the depth map image. This was creating intermediate values between the background and the object boundaries. Forcing to nearest neighbor interpolation was the solution, but the contours were still there. The shader was interpolating the values when we extracted the depth value from the texture. The coordinates are indexed from 0 to 1 instead of 0 to width (or 0 to height). We had to crop the decimal value of the coordinate to make wure it was falling at the center of the texture pixel to avoid interpolation. While doing so we found out that unity3d is indexing pixels coordinates from the pixel corner, not the center. An offset of 0.5 must be added to the coordinate in the native image range. 

Project Information URL: http://www.youtube.com/watch?v=OTmHtAaQD_c&feature=youtu.be