rhm said:
W3bbo said:
*snip*

I notice the thing has two lenses. It could be that one is for the infra-red camera and the other for visible light camera, but it could also be that it has an infra-red camera behind both and uses either a sensor that can sense infra-red and visible light simultaneously, or has some way of splitting the light on one side to drive both kinds of sensor.

 

If it can sense infra-red from both lenses then detecting depth is a standard computer vision problem of finding coherence between parts of the two images and using the difference in horizontal axis to determine distance. The reason for using infra-red and projecting a grid would be to make the machine-vision task a heck of a lot easier than it is when you just have the subject's natural texture to go on.

It seems that there are three lenses on the front.  One is an RGB camera which has nothing to do with motion tracking, its for other scenarios.  And the other two, one floods the scene with "coded" infared light, and another camera reads back where the "coded" light hit objects.

 

My guess there is some processing that reads how the IR projection is perturbed by objects in the scene, and gets a 3D point cloud out of that.  Then the 3D point cloud gets fed into some kind of neural net trained by a huge dataset of human poses (kind of like handwriting recognition), then "magic", and out comes bunch of skeletal joints.

 

Anyway regardless of how it works exactly, its one of the most high-tech consumer products out there.

 

 

Edit: Found a site that talks about using "coded light" for 3D object recognition, here: http://academic.research.microsoft.com/Paper/6148176.aspx