There's so much potential, but you're absolutely right, things are going to have to be redesigned at the OS level to maximize the potential.  It took years for touch to get into the OS and it's obvious that the traditional desktop experience isn't that great for it, and with this kind of precision 3D touchless interface, it will take a considerable amount of time for things to evolve.

At the base of the OS, the input itself has to respect the degrees of freedom allowed by this kind of UI.  I mean, things like fonts and writing will have to be changed.  I think that natural language will evolve in huge leaps, as we abandon keyboard input as the primary input and a new generation of calligraphy emerges in 3D space.  People will be able to rapidly create 3D models, like manipulating clay, to express thoughts and ideas.  They will be able to animate it and make it interactive.  Text is not going away, but I think that a new world of media as a language will emerge.  That has to be baked into the OS.  A new kind of shell that makes 3D models first class.

EDIT:   I also think that technology like Photosynth will play a huge role, as it will allow hyperlinking images and 3D modeling.  This has already been demonstrated by Bing and Photosynth, i.e. an image in a picture links to a 3D model made from pictures of the same source.  So, imagine if I model something with some basic 2D outline, or even 3D virtual clay, like the shape of house I want to search for on Bing maps, and when I find it, I can create a hyperlink to related shapes.