Blog Post

Herb Sutter: Heterogeneous Computing and C++ AMP

Play Herb Sutter: Heterogeneous Computing and C++ AMP
Sign in to queue


Herb Sutter introduces the AMD Fusion Developer Summit 11 crowd (and the world!) to Microsoft's view on heterogeneous computing in the concurrency age and introduces one of Microsoft's upcoming technologies for democratizing GPGPU/APU/Multi-Core/Many-Core programming for native developers: C++ Accelerated Massive Parallelism or C++ AMP. Look for C++ AMP and associated tooling in the next version of Visual C++.

Big thanks to AMD for generously providing Channel 9 with this outstanding content! Get the slides for this presentation here.

Herb and the C++ AMP team state: C++ AMP will lower the barrier to entry for heterogeneous hardware programmability, bringing performance to the mainstream. Developers will get an STL-like library as part of the existing concurrency namespace (whose Parallel Patterns Library – PPL- and its Concurrency Runtime – ConcRT- are also being enhanced in the next version of Visual C++) in a way that developers won't need to learn a different syntax, nor using a different compiler.

C++ AMP is an open specification. What does this mean, exactly? Well, let Herb answer:

[54:05] -> Herb says "Microsoft intends to make C++ AMP an open specification that any compiler can implement.  And we're working with our hardware partners to help them to build C++ AMP into C++ compilers for any hardware target, for any operating system target they want.  We're helping them.  And we're also pleased to announce that one of those is AMD, that AMD will be implementing C++ AMP in their FSA reference compiler for Windows and non-Windows platforms."

After watching this, please tune into Daniel Moth's deep dive C++ AMP session, captured at the same event.

You learned in the C++ Renaissance conversation with Mohsen Agsen and Craig Symonds that the C++ team was on a path of innovation. C++ AMP is a concrete example of what Mohsen and Craig were talking about.

Learn more about C++ AMP:



Right click to download this episode

The Discussion

  • User profile image
    Mr Crash

    I'm cautiously and sceptically optimistic. If microsoft did this right, this time, it would be good news indeed.

    But C++AMP comes from microsoft so i'm prepared to be disappointed yet again  Sad

  • User profile image

    Where can we get the slides in ppts?

  • User profile image

    @chanmm:Slides for this presentation and a few others in PDF form are: are more presentations as well. Smiley

    The future IS Fusion Smiley

    Herb, for the record I really dislike that (direct3d) is the keyword to enable the restriction., But yes your finally implementing my(old) idea of passing not only the architectures as as a command line flag as a target but also the device to know the memory properties for the execution; CPU/GPU.  No matter what, there must be at least 2 dimensions as you showed in the presentation. I believe BOTH must be passed to the compiler, the architecture/execution style/depth of the computation and then a 2nd parameter must be passed which describes which side of the scale the memory model consists of, large address spaces, or small ones. Or let the guts of the compiler track the types of optimization it does like auto-vectorization and a loop unrolling depth, and if it does a certain combination of optimization then it can tag the function to automatically be restricted/enabled for candidate use in DirectCompute and AMP'd.

    Can't wait for the day to come when the compiler will actually be able to infer the dial for both the architecture and the dial for memory model back and fourth on a per-function basis in the end and all linked together for a real masterpiece of a binary based on code keywords or code decorations or declarations or pragmas or even the compilers auto-detection of valid optimization during compilation. This may actually bear fruit where the past research didn't succeed from lack of knowledge or resources unlike now when the timing is right.

    Maybe "restrict (vector_unit)" would be a more general and less product advertising keyword to use instead of "direct3d". even if it 95% relies on the DX11 DirectCompute stuff since generality = love just as much as an open standard = love.



  • User profile image

    There is some VERY interesting stuff happening here, and I am interested to see how it develops.

    There are some things about the restrict stuff that aren't clear to me at this point.  Does restrict apply to the local scope or is it a global scope thing?  In heterogeneous architectures, one obviously will need to target some parts of the code to one type of processor (e.g. CPU), while other parts could more effectively be targeted to another (e.g. GPU).  It seems like this could become hard to manage.  If the compiler gets smart enough to figure it out on its own, will we even need the restrict tag?

    How does this scale across hardware?  Let's say I write some code that is designed (and tagged) to be run on a GPU or other parallel implementation.  If that hardware is then not available at runtime, can it drop back to a pure CPU based implementation (with a big loss of speed of course), or will the program just fail to run?  There has to be a fall-back strategy to this, or it will be a distribution nightmare.  Compiled into native code for a particular hardware set, we will be back to an environment where binaries will have to be created for every hardware combination customers might have -- not a manageable process.  This seems like a situation that screams out for a JIT to target specifically to the hardware configuration of the current platform.  Probably this has all been worked out and I just missed something in the presentation.

    This is an open standard, but is based on support for DX11 DirectCompute.  I wonder if this will [eventually] carry over to platforms that don't support DirectCompute (say OpenCL maybe)?

    Will we see this open standard work it's way into the next C++ language standard? (assuming mostly as an officially-spec'd library more than a language extension)


  • User profile image

    This looks pretty cool.  I see that it supports n-dimensional data and GPU builtin functions, so that answers two of my questions.

    The C++ extension looks clean enough, I'll be curious to see if GCC and ICC implement it.  Still, at the moment it looks like it's only going to be useful for quickly adding some GPU support to apps targeting Windows Vista and up.  For anything portable, OpenCL will remain king.

    Some questions:

    1. Does it support keeping data in RAM closest to the execution unit?  For example, if I have a multi-pass function like an orthogonal 2D resampler, I want to keep the data in GPU RAM without having it transfer to CPU RAM between passes.
    2. Does it use texture units?  I see a 2D array example in the video, but it's not clear if it uses a texture.  Textures are different in that they are organized in memory as 2D tiles instead of 1D rows -- this improves cache usage when your algorithm has 2D access patterns.
    3. Does it support SIMD and easy shuffles?
    4. Are there any things you can do in GPGPU languages that you can't do in AMP, or that AMP makes more difficult?
    5. How does performance compare to writing directly in a GPGPU language?
  • User profile image

    Must "restrict" targets be built into the compiler, or is it possible for library writers to add their own restrict targets?

  • User profile image
    Steve Miller

    Why do my comments get removed? did I write something wrong? :(

  • User profile image

    @Steve Miller: Your comment are at

    There is two blogs to the same video :(

  • User profile image

    @new2STL: We are working on this. Sorry.


  • User profile image
    Steve Miller

    @new2STL Thanks, did not notice.

    @Charles sorry guys, did not want to blame anyone.

  • User profile image

    Over 800 GFLOPS!  Unbelievable!  Great keynote.  I am really amazed. Big Smile

  • User profile image

    @Richard.Hein: Agreed. Mind-mlowing demo for sure.


  • User profile image


  • User profile image

    We did this (generated GPU code using lambda functions) more than a year ago :)

    We went to NVidia and we went to AMD, and we said: look, we think this is cool, but they were not interested.

    Maybe we should have gone to Microsoft :)

  • User profile image

    Like Herb said; it seems to be the cleanest design, no doubt. I have just one 'issue' with the syntax of the restrict keyword - which I did point out to the PM there at the conference who just would not agree!

    The keyword restrict's sytax is 'alien' to the C++ language. Keywords in the language appear by themselves and no other keyword (to the best of my knowledge) takes arguments that are essentially open-ended strings. Who gets to decide which strings are acceptable and which ones not. How do you coordinate whose implementation is chosen if two separate libraries choose to use the same 'string' in the restrict's argument?

    And finally, the whole point that a certain string becomes a keyword-level value when used inside the brackets of restrict but is a perfectly valid variable name used elsewhere is just not neat.

    Even a restricted list of strings would be bad enough. It leaves open the door to cram stuff into the language and allows for arbitrary extensions to it.

    The syntax is awfully like that of __declspec; and I think __declspec is acceptable; it is explicitly saying it is compiler-vendor specific (use of __), declspec (without __) shouldn't be.

    Also, why couldn't #pragme be used?

    C++ continues to remain one of the best languages, I believe, primarily because of its well thought out additions and resistance to add quick-fix type 'specific' features.

    If the language as a whole needs a facility to do compile-time code injection, or hints to compiler to generate alternate code, then perhaps it needs to be thought out in a broader context and as a generic facility to the language as a whole. After all, there may be several other areas where the ability to tell the compiler to generate a certain-type of code or to generate prologue and epilogue codes could come in handy.

  • User profile image
    Mac RoShaft

    Praise god that Bill Gates is no longer active at Microsoft. I always said that Steve Ballmer is the real brains behind all the innovation at MS. You can see this from new products like Visual Studio 2010, 2010.6, 2010.7, and 2011.

Add Your 2 Cents