parallel_for_each - C++ AMP - msdn mag companion part 3

Play parallel_for_each - C++ AMP - msdn mag companion part 3

The Discussion

  • User profile image

    Nice video. Thank you for uploading these Smiley


    You know the OpenCL 1.1 HelloWorld example from "OpenCL Programming Guide" by Aftab Munshi (editor of the OpenCL specification) -after removing all the comments- is around 260+ lines plus a 10-line kernel, and it exactly does the same thing: adding the corresponding elements of two arrays and putting the results in a third array. After watching this video, I'm really questioning the OpenCL's execution model and the need for all that explicit context creation, command queue creation, program object and memory object creation and alike in OpenCL! If we can do the same thing with ~50 lines of code, and with the simplicity we see in this video, why should we even bother using an execution model like OpenCL's? a performance comparison between C++ AMP, OpenCL and CUDA would be great to answer that question (I haven't seen any yet) it will defenitely tell us if we really need a complicated execution model like OpenCL's (or may be where do we need such a model)

  • User profile image
    Daniel Moth

    @AliKouhzadi: Glad you like it!

    An even simpler example of showing how productive you can be with C++ AMP is our "Hello World" example. We also have learning guides for those familiar with other programming models, please get them here: CUDA, OpenCL, DirectCompute

    The performance is comparable between all these approaches, and in our tests is not a factor for choosing one over the other, even now that the product is in Beta and we are still tuning the bits. Once we RTM, we invite anyone to measure the performance difference between C++ AMP and any other approach and share their workloads and results on a variety of hardware for comparison.

Add Your 2 Cents