parallel_for_each - C++ AMP - msdn mag companion part 3

Download this episode

Download Video


Hi, I am Daniel Moth Smiley

This screencast is one part of a 4-part series accompanying my MSDN Magazine article that you can read online: A Code-Based Introduction to C++ AMP.

Please watch first the screencasts that precede this part, and then follow the links below to watch the screencasts that follow it.

  1. Setup code - C++ AMP - msdn mag companion part 1
  2. array_view, extent, index - C++ AMP - msdn mag companion part 2
  3. parallel_for_each - C++ AMP - msdn mag companion part 3
  4. accelerator - C++ AMP - msdn mag companion part 4

To learn more please visit the C++ AMP blog, and we encourage C++ AMP questions in the Parallel Computing in C++ and Native Code MSDN forum.



Available formats for this video:

Actual format may change based on video formats available and browser capability.

    The Discussion

    • User profile image

      Nice video. Thank you for uploading these Smiley


      You know the OpenCL 1.1 HelloWorld example from "OpenCL Programming Guide" by Aftab Munshi (editor of the OpenCL specification) -after removing all the comments- is around 260+ lines plus a 10-line kernel, and it exactly does the same thing: adding the corresponding elements of two arrays and putting the results in a third array. After watching this video, I'm really questioning the OpenCL's execution model and the need for all that explicit context creation, command queue creation, program object and memory object creation and alike in OpenCL! If we can do the same thing with ~50 lines of code, and with the simplicity we see in this video, why should we even bother using an execution model like OpenCL's? a performance comparison between C++ AMP, OpenCL and CUDA would be great to answer that question (I haven't seen any yet) it will defenitely tell us if we really need a complicated execution model like OpenCL's (or may be where do we need such a model)

    • User profile image
      Daniel Moth

      @AliKouhzadi: Glad you like it!

      An even simpler example of showing how productive you can be with C++ AMP is our "Hello World" example. We also have learning guides for those familiar with other programming models, please get them here: CUDA, OpenCL, DirectCompute

      The performance is comparable between all these approaches, and in our tests is not a factor for choosing one over the other, even now that the product is in Beta and we are still tuning the bits. Once we RTM, we invite anyone to measure the performance difference between C++ AMP and any other approach and share their workloads and results on a variety of hardware for comparison.

    Comments closed

    Comments have been closed since this content was published more than 30 days ago, but if you'd like to send us feedback you can Contact Us.