C++ AMP: Daniel Moth - Overview

Download this episode

Download Video


C++ AMP (Accelerated Massive Parallelism) is a small set of open specification language extensions (two of them) and a single library (amp.h) that makes general purpose GPU programming (aka GPGPU) a first class, seamless experience in modern C++.

You've been able to experiment with C++ AMP since the VS11 Developer Preview back in September 2011. We figured it was a good time to go C9 on the C++ AMP team. So, we did. Four interviews have been conducted that pretty thoroughly cover C++ AMP and the people who design, implement, and test it. C++ AMP is a great technology for native developers seeking to harness the power of the GPU using the language and tools they are already comfortable with. C++ AMP is also an open specification and we'll see other compiler vendors producing C++ AMP implementations for their target platforms soon—that's been the goal since Day 1.

Get VC++11 Beta Now - test out AMP!

Here, we meet the legendary Daniel Moth. Daniel's actually been on C9 a few times and his technical screencasts are among the very best that have been produced for C9 (please make more, Daniel!!). Daniel is the C++ AMP program manager (PM).

PMs are responsible for documenting the vision, building the execution plans to turn the vision into a real project, manage the project to completion, ship the result of the project and to attract and work with customers to help them understand what the product means for them and why it matters to their business. It's quite a job. Daniel is one part of a team of engineers that consists of program management (Daniel), architecture (Yossi Levanoni is one of the architects of AMP and you'll meet him (again) in one these interviews), development (you'll meet the AMP dev team soon, round-table style), and test (you'll meet the AMP test team soon, hallway-tour style).

So, tell us what C++ AMP is and why it matters, Daniel. Give us some context and history. What's going on here? What's next?

Tune in and meet the people behind the restrict(ion)s, kernels, etc. It's GPGPU time on C9. Enjoy.

See Part 2 - Yossi Levanoni: AMP Architecture and Design
See Part 3 - The AMP Development Team Roundtable
See Part 4 - The AMP Test Team Hallway Office Tour



Available formats for this video:

Actual format may change based on video formats available and browser capability.

    The Discussion

    • ryanb

      Glad to see this series.  Rock on Charles, and thanks up-front to the AMP team.


    • n0x30n

      As a .NET developer i'm so jealous of you C++ guys. I sincerely hope that we are getting a managed version of AMP, too.

    • Sonicflare

      its relatively simple and intuitive. Makes me wonder.. why nobody did this before? Smiley

      btw.. when is VS11 beta? ^^ 

    • magicalclick

      hi C9, I am experience time jumps using smooth streaming. I am switching to traditional buffer mode now. time jumps = it fast forward for like 2 seconds of video in 0.5 second real time.

    • magicalclick

      what about Shared Memory? I am not keen on GPU computing, but, does the Array_View works with Shared Memory where it doesn't need to copy the data at all? This is probably a trivial question for people in the industry, I have no experience on this actually.

    • PerfectPhase

      , Sonicflare wrote

      btw.. when is VS11 beta? ^^ 

      Best guess, word on the street is 29th feb.

    • mjjj

      Video skips A LOT

    • Granville Barnett

      At around 9 minutes a claim is made RE AMP being the first C++ API for GPGPU, but Thrust ( has been around for some time. (I admit this is bound to CUDA, though.)

    • Charles

      We're investigating the skipping issue. For the time being, please download the video and watch offline. If you must watch in the inline player - and you have a C9 account - please go to your profile, click the Edit Profile button, change the default streaming setting to Progressive. Please let us know if this solves the skipping issue (we think we know what the problem is, but we need more data!).


    • mjjj

      Yes, changing to Progressive solved the problem for me. Thanks

    • Chris

      It's feb 29, is there gonna be a goingnative episode? :)

    • Charles

      @Chris: Probably should have done one for Leap Day, but no. We released one on Feb 22, though:

    • Hakime

      I am amazed on how much non sense I heard in this video. Comparing AMP and OpenCL is non sense. OpenCL was designed as a low level data and task parallelism language and runtime. The point of OpenCL is that it allows to build something higher level like a minimal API being in C++ or not. It truly allows an open standard based approach. I am sorry but hearing this guy saying that AMP is an open spec is laughable when in the same time it requires a DirectX driver to run. DirectX is not an industry standard and it is basically tied to Windows. There is no point to any vendor not supporting DirectX to try to implement this thing on top on something else.

      It was even more laughable when he started to say that there is no a similar high level "gate" to parallelism than AMP. Get out from the closed walls of Microsoft please... Just buy yourself a Mac and you will be able to code a data parallel program out of the box with Xcode using Grand central dispatch. GCD targets the CPU and the GPU (using OpenCL) using blocks (lambdas are similar to blocks) and queues. The programmer can dispatch blocks of code either to a CPU or GPU queue with minimal code. GCD is itself minimal and exist as a single library that you don't even need to link to manually. You can use GCD in C, C++ and Objective C and the only other thing that you need to do is to write the kernel in OpenCL, which by definition allows a lot of flexibility if you want high control on the computation you want to do.

      OpenCL is an open standard and GCD was open sourced by Apple. Microsoft could have done the same thing and use OpenCL to build a paradigm for Windows for parallel programming instead of coming up with a solution it wants to sell us as an open spec. Microsoft did the same with OpenGL. Instead of supporting an industry standard API for 3D programming, it came up with DirectX with the only aim to fragment the market and use it's large deployment of Windows to create a competitive advantage.

    • Chris


      Oh, the new GN episode lacked the C++ tag so it didn't show up for the 'C++' tag.

    • Charles

      @Chris: Oops Smiley My bad. Fixed.


    • Daniel Moth

      @ryanb: Thanks, hope you enjoy all of them.

      @n0x30n: While there will be no built-in .NET way of achieving this, we have documented how easy it is to interop from .NET to C++ and C++ AMP to utilize the GPU. The samples are not updated to Beta yet, but the techniques are the same:

      @Sonicflare: You probably already know this, but Beta is out Smiley

    • Daniel Moth


      Shared memory is where a lot of the hardware is heading. C++ AMP is designed well for shared memory architectures:

      • the array_view type does not have explicit copy requirements, and instead performs implicit on demand copying for you.
      • kernel invocation (parallel_for_each) does not explicitly describe any copying of data at all - it is all done through the subtle capture of data types in the lambda, so in future releases it will be very easy to allow capturing additional data types without changing the API.
      • for repeated copies we also offer the staging arrays feature (see our blog).
      • finally, the restriction model has a versioning story that is described in an appendix of the C++ AMP open spec 

      Having said all that, while the *design* caters for it, the Microsoft *implementation* in v1 does not offer sahred memory support – we simply run out of time to implement that under the covers. For shared memory hardware, this means that we still perform a copy through DirectX, but since the memory is not on discrete hardware, the performance copying penalty is not as large. Other implementers of the C++ AMP open specification can offer this capability as they see fit.

      It is important to note that, even with shared memory, some scenarios will still benefit from explicit allocation and copying when primary access is from the CPU or the GPU (shared memory may still have non-uniform characteristics). So we believe that the basic ability to associate arrays with an accelerator will retain value into the foreseeable future.

    • Daniel Moth

      @Hakime: I was going to filter out your tone and address some of the misconceptions in your comment, but then I scanned your other comments on other channel 9 videos, and noticed the very consistent and exclusive pattern in your approach: you dismiss anything that comes from Microsoft and try to promote something coming from Apple. So I hope I'll be forgiven for not engaging you in response, beyond this.

    • Daniel Moth

      @Granville Barnett: Yes, when I said "yes we think so" it was in response to Charles' comment/question which included the words "modern" and "true" C++ API, so any response to that will be subjective given the vagueness of those terms in this context. So while I may believe that C++ AMP is the first truly modern C++ API for heterogeneous computing and that it is "better" than similar approaches (I am slightly biased Smiley), so please evaluate alternative approaches and judge for yourself. Thanks for bringing this up.

    • magicalclick

      @Daniel Moth:

      That's good to know, thank you for answering.

    Comments closed

    Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.