@piersh: When we release there will be no need for versioning. We have various design options for future releases where versioning may be required. Remember, the versioning would only help in relaxing restirctions and allowing you to "do more" in your kernel code, hence recompiling would be necessary regardless and you just need a way to delcare what restictions you want to adhere to.
@erik: I have not observed this with my tests. Please try DirectCompute/HLSL and see if you observe the same results. If you do, then you will with C++ AMP too, since this is a driver thing, not a programming model thing.
@drbaltazar: The comment you are referring to was on a slide that included the word "today". My following future-looking slide (which included the word "tomorrow") left the door open for whatever and our design is definitelly future proof.
@Fredrikkarlsson: We have nothing to announce with regards to C++ AMP technology coming directly to .NET. However, you can write a C++ AMP dll and use that from your .NET code... I'll put a sample of that on my blog at some point...
Not sure I totally get your scenario and without you having the bits to try, not sure we can explore it much further.
Just remember that you cannot copy a host array over to the GPU and expect a modification on the host side to affect it (without recopying back and forth). You can leave an array on the GPU and run a different kernel on it, but that different kernel would need its own algorithm to determine what data to operate on and what not – here remember that if you introduce a lot of branching in your kernel you may see a perf hit larger than the perf gain you are attempting to achieve...
If you want to explore this further, feel free to share your CPU algorithm and we can, otherwise I suggest waiting for the bits so you can explore it hands on