@Chris: done, thanks for the feedback.
Comments
My blog: The Moth.
@DanielMoth-
-
@AliKouhzadi: Glad you like it!
An even simpler example of showing how productive you can be with C++ AMP is our "Hello World" example. We also have learning guides for those familiar with other programming models, please get them here: CUDA, OpenCL, DirectCompute.
The performance is comparable between all these approaches, and in our tests is not a factor for choosing one over the other, even now that the product is in Beta and we are still tuning the bits. Once we RTM, we invite anyone to measure the performance difference between C++ AMP and any other approach and share their workloads and results on a variety of hardware for comparison.
-
@Granville Barnett: Yes, when I said "yes we think so" it was in response to Charles' comment/question which included the words "modern" and "true" C++ API, so any response to that will be subjective given the vagueness of those terms in this context. So while I may believe that C++ AMP is the first truly modern C++ API for heterogeneous computing and that it is "better" than similar approaches (I am slightly biased
), so please evaluate alternative approaches and judge for yourself. Thanks for bringing this up. -
@Hakime: I was going to filter out your tone and address some of the misconceptions in your comment, but then I scanned your other comments on other channel 9 videos, and noticed the very consistent and exclusive pattern in your approach: you dismiss anything that comes from Microsoft and try to promote something coming from Apple. So I hope I'll be forgiven for not engaging you in response, beyond this.
-
Shared memory is where a lot of the hardware is heading. C++ AMP is designed well for shared memory architectures:
- the array_view type does not have explicit copy requirements, and instead performs implicit on demand copying for you.
- kernel invocation (parallel_for_each) does not explicitly describe any copying of data at all - it is all done through the subtle capture of data types in the lambda, so in future releases it will be very easy to allow capturing additional data types without changing the API.
- for repeated copies we also offer the staging arrays feature (see our blog).
- finally, the restriction model has a versioning story that is described in an appendix of the C++ AMP open spec http://blogs.msdn.com/b/nativeconcurrency/archive/2012/02/03/c-amp-open-spec-published.aspx
Having said all that, while the *design* caters for it, the Microsoft *implementation* in v1 does not offer sahred memory support – we simply run out of time to implement that under the covers. For shared memory hardware, this means that we still perform a copy through DirectX, but since the memory is not on discrete hardware, the performance copying penalty is not as large. Other implementers of the C++ AMP open specification can offer this capability as they see fit.
It is important to note that, even with shared memory, some scenarios will still benefit from explicit allocation and copying when primary access is from the CPU or the GPU (shared memory may still have non-uniform characteristics). So we believe that the basic ability to associate arrays with an accelerator will retain value into the foreseeable future.
-
@ryanb: Thanks, hope you enjoy all of them.
@n0x30n: While there will be no built-in .NET way of achieving this, we have documented how easy it is to interop from .NET to C++ and C++ AMP to utilize the GPU. The samples are not updated to Beta yet, but the techniques are the same: http://www.danielmoth.com/Blog/NET-Access-To-The-GPU-For-Compute-Purposes.aspx
@Sonicflare: You probably already know this, but Beta is out
http://blogs.msdn.com/b/nativeconcurrency/archive/2012/02/29/visual-studio-11-beta-get-it-now.aspx -
@Ivan: C++ AMP enables massive data parallelism. Typically that has been used in game development on one extreme, and Technical or Scientific computing on the other. With the capable hardware becoming more ubiquitous, and the programming model more approachable, you can expect those domains to become more mainstream but also new scenarios to start benefiting, e.g. augmented reality, image/video manipulation, voice recognition and other such consumer facing opportunities. For existing apps, look at each loop in your application and ask yourself: am I processing a lot of data and/or performing expensive operations in this loop? If the answer is yes, it is a good candidate.
@Matt_PD: It's runtime overhead (but we have optimized this as much as we can). Although we have enabled this feature, we have not come across any real world cases that have taken advantage of it yet, so if you use C++ AMP for >3 dimensions, please let us know.
-
@g227: C++ AMP runs on servers, and we have early adopters doing exactly that. If you are the same GT227 that posted on the C++ AMP MSDN forum, may I suggest keeping the discussion there?
-
@ All: glad you enjoyed the presentation.
@Freeman: For VS 11 timeframe, our recommendation is interoping from .NET as per the blog post you found. For future releases, we may consider adding this capability directly to the .NET Framework based on customer feedback, but it is not in any plans right now.
-
@David: For SSE support, we have nothing to announce today, but stay tuned
