He's not waving. His hand reached the edge of the content (and he definitely knows what the interface is doing since the actions are controlled). Also, there isn't a cursor...you're moving the entire screen.
I'm not sure how projecting something behind the UI would help, but we did consider having a smaller depth view of the user in the corner for reference (like in the Xbox UI). But from user testing, we never ended up needing that as we had enough feedback/indicators to allow users to play around with the system to get better acquainted. On average, users reached mastery within 15-20 secs. Maybe a tiny depth view would decrease that down to 5 secs? Definitely a good suggestion.