Have you guys considered using high frequency echolocation for the same tracking? I'm no sound expert but I've read that you can get an extremely good accuracy of location and depth. Instead of tracking only the surface of an object you could track the contents as well to get a true 3-D representation of the object. You could also get a much larger area of coverage without all of those cameras or you could use it in concert with the cameras for an even higher level of accuracy in tracking a small area?
According to the UAH humans can learn echolocation in two weeks, how hard could it be! :^P