Shared memory is where a lot of the hardware is heading. C++ AMP is designed well for shared memory architectures:
the array_view type does not have explicit copy requirements, and instead performs implicit on demand copying for you.
kernel invocation (parallel_for_each) does not explicitly describe any copying of data at all - it is all done through the subtle capture of data types in the lambda, so in future releases it will be very easy to allow capturing additional data types without changing the API.
for repeated copies we also offer the staging arrays feature (see our blog).
Having said all that, while the *design* caters for it, the Microsoft *implementation* in v1 does not offer sahred memory support – we simply run out of time to implement that under the covers. For shared memory hardware, this means that we still perform a copy through DirectX, but since the memory is not on discrete hardware, the performance copying penalty is not as large. Other implementers of the C++ AMP open specification can offer this capability as they see fit.
It is important to note that, even with shared memory, some scenarios will still benefit from explicit allocation and copying when primary access is from the CPU or the GPU (shared memory may still have non-uniform characteristics). So we believe that the basic ability to associate arrays with an accelerator will retain value into the foreseeable future.
@Ivan: C++ AMP enables massive data parallelism. Typically that has been used in game development on one extreme, and Technical or Scientific computing on the other. With the capable hardware becoming more ubiquitous, and the programming model more approachable, you can expect those domains to become more mainstream but also new scenarios to start benefiting, e.g. augmented reality, image/video manipulation, voice recognition and other such consumer facing opportunities. For existing apps, look at each loop in your application and ask yourself: am I processing a lot of data and/or performing expensive operations in this loop? If the answer is yes, it is a good candidate.
@Matt_PD: It's runtime overhead (but we have optimized this as much as we can). Although we have enabled this feature, we have not come across any real world cases that have taken advantage of it yet, so if you use C++ AMP for >3 dimensions, please let us know.
@Freeman: For VS 11 timeframe, our recommendation is interoping from .NET as per the blog post you found. For future releases, we may consider adding this capability directly to the .NET Framework based on customer feedback, but it is not in any plans right now.
@piersh: I don't hink I missed your point. Yes, like I said, we have various design options for *future* releases where versioning will be required. Herb's reply that Charles pointed you to, is one of those design options - it is not final, but shows an example (another would be compiler options for example). I pointed out that you do not need to worry about that in our first release. HTH.
@DeadMG: First let me say "wow!". I can't believe you wrote all that code without a compiler after seeing just one slidey talk. I haven't run it through the compiler, but it looks like it would compile. The only thing you need to add is a call to refresh on the input_view array_view so it can reflect the changes you made to the input vector. The other way to have done it is to use input_view directly on the CPU side to update it (and the changes would immediatelly propagate to input). There are more considerations (particularly around performance) depending on whether the data you access in the second kernel invocation are large/small, sparse/dense but that will have to do for now...
So the answer to your original question is that to use arrays as indices, you would have to do exactly what you did in your code... there are no other provisions... Feel free to contact me offline to talk about those.
@Londey: Please see my response to piersh on versioning. Yes you can have a function be callable from both CPU and direct3d code by combining the restictions e.g. restrict(cpu, direct3d). This is covered in the talk. For an implicit fallback to SSE, we have nothing to announce today, but stay tuned