Today's project comes to us from Marc Drossaers who teaches us a good bit about jitter, what it is and best of all how we can add a jitter filter to our next Kinect project. This is a smaller piece of a bigger project he's working on, one that I'm sure we'll be highlighting soon..
This blog post introduces a filter for the jitter caused by the Kinect depth sensor. The filter works essentially by applying a dynamic threshold. Experience shows that a threshold works much better than averaging, which has the disadvantage of negatively influencing motion detection, and has only moderate results. The presented DiscreteMedianFilter removes the jitter. A problem that remains to be solved is the manifestation of depth shadows. Performance of the filter is fine. Performance is great in the absence of depth shadow countermeasures.
Kinect depth images show considerable jitter, see e.g. the depth samples from the SDK. Jitter degrades image quality. But it also makes compression(Run Length Encoding) harder; compression for the Kinect Server System will be discussed in a separate blog post. For these reasons we want to reduce the jitter, if not eliminate it.
Kinect Depth Data
What are the characteristics of Kinect depth data?
Literature on Statistical Analysis of the Depth Sensor
Internet search delivers a number of papers reporting on thorough analysis of the depth sensor. In particular:
We are interested in the depth properties of the 640×480 spatial image that the Kinect produces at 30 FPS in the Default range. From the he SDK documentation we know that the Kinect provides depth measurements in millimeters. A dept value measures the distance between a coordinate in the spatial image and the corresponding coordinate in the parallel plane at the depth sensor, see image below from the Kinect SDK Documentation.
The Kinect depth measurements are characterized by some uncertainty that is expressible as a random error. One can distinguish between errors in the x,y-plane on the one hand, and on the z-axis (depth values) on the other hand. It is the latter that is referenced to as the depth jitter. The random error in the x,y-plane is much smaller than the depth jitter. I suspect it manifests itself as the color jitter in the KinectColorDepthServer through the mapping of color onto depth, but that still has to be sorted out. Nevertheless, the filter described here is also applied to the color data, after mapping onto depth.
The depth jitter has the following characteristics:
A Kinect Produces a Limited Set of Discrete Depth Values
It is not the goal of the current project to correct the Kinect depth data, we just want to send it over an Ethernet network. What helps a lot is, and you could see this one coming:
The Kinect produces a limited set of depth values.
The Kinect for Windows produces 345 different depth values in the Default range, not counting the special values for unknown, and out of range measurements. The depth values for my Kinect for Windows are (divide by 8 to get the depth distance in mm):
I’ve experimented with several approaches: sliding window of temporal averages, Bilateral Filter. But these were unsatisfactory:
- Reduction of Jitter is much less good compared to applying a threshold.
- Movement detection is as much reduced as the jitter, which is an undesirable effect.
A simple threshold, of about the size of the breadth of the error function proved the best solution. As noted above, the jitter typically is limited to a few value above and below the ‘real’ value. We could ...
The DiscreteMedianFilter Removes Jitter
In practice we see no jitter anymore when the filter is applied: The DiscreteMedianFilter ends the jitter (period). However, the filter is not applicable to (edges of) depth shadows.
Actually, it turned out that this filter is in fact too good. If the Kinect registers a moving object, we get a moving depth shadow. The filter cannot deal with illegal depth values, so we are stuck with a depth shadow smudge.
A modest level of noise solves this problem. In each frame 10% of the pixels the filter skips is selected at random, and updated. This works fine, but it should be regarded as a temporal solution: the real problem is, of course, the depth shadow, and that should be taken up.
The Discrete Median Filter was implemented in C++, as a template class, with a traits template class (struct, actually); one specialization for the depth value type and one specialization for the color value type, to set the parameters that are typical for each data type, and a policy template which holds the variant of the algorithm that is typical for the depth and color data respectively. For evaluation purposes, I also implemented traits and policy classes for unsigned int.
Channels and Offset
Color data is made up of RGBA data channels: e.g. R is a channel. Working with channels is inspired on data compression. More on this subject in the blog post on data compression.
The advantages of working with channels for the DiscreteMedianFilter are:
The code is complex at points, so it seems to me that printing the code here would raise more questions than it would answer. Interested people may download the code from The Byte Kitchen Open Sources at Codeplex. If you have a question about the code, please post a comment.
How much space and time do we need for filtering?
A small test program was built to run the filter on a number of generated arrays simulating subsequent depth and color frames. The program size never gets over 25.4 megabytes. The processing speed (without noise) is:
Project Information URL: http://thebytekitchen.com/2014/03/17/a-jitter-filter-for-the-kinect/
Project Source URL: The Byte Kitchen Open Sources at Codeplex.