Blog Post

Herb Sutter: Heterogeneous Computing and C++ AMP

Play Herb Sutter: Heterogeneous Computing and C++ AMP

The Discussion

  • User profile image
    Mr Crash

    I'm cautiously and sceptically optimistic. If microsoft did this right, this time, it would be good news indeed.

    But C++AMP comes from microsoft so i'm prepared to be disappointed yet again  Sad

  • User profile image

    Where can we get the slides in ppts?

  • User profile image

    @chanmm:Slides for this presentation and a few others in PDF form are: are more presentations as well. Smiley

    The future IS Fusion Smiley

    Herb, for the record I really dislike that (direct3d) is the keyword to enable the restriction., But yes your finally implementing my(old) idea of passing not only the architectures as as a command line flag as a target but also the device to know the memory properties for the execution; CPU/GPU.  No matter what, there must be at least 2 dimensions as you showed in the presentation. I believe BOTH must be passed to the compiler, the architecture/execution style/depth of the computation and then a 2nd parameter must be passed which describes which side of the scale the memory model consists of, large address spaces, or small ones. Or let the guts of the compiler track the types of optimization it does like auto-vectorization and a loop unrolling depth, and if it does a certain combination of optimization then it can tag the function to automatically be restricted/enabled for candidate use in DirectCompute and AMP'd.

    Can't wait for the day to come when the compiler will actually be able to infer the dial for both the architecture and the dial for memory model back and fourth on a per-function basis in the end and all linked together for a real masterpiece of a binary based on code keywords or code decorations or declarations or pragmas or even the compilers auto-detection of valid optimization during compilation. This may actually bear fruit where the past research didn't succeed from lack of knowledge or resources unlike now when the timing is right.

    Maybe "restrict (vector_unit)" would be a more general and less product advertising keyword to use instead of "direct3d". even if it 95% relies on the DX11 DirectCompute stuff since generality = love just as much as an open standard = love.



  • User profile image

    There is some VERY interesting stuff happening here, and I am interested to see how it develops.

    There are some things about the restrict stuff that aren't clear to me at this point.  Does restrict apply to the local scope or is it a global scope thing?  In heterogeneous architectures, one obviously will need to target some parts of the code to one type of processor (e.g. CPU), while other parts could more effectively be targeted to another (e.g. GPU).  It seems like this could become hard to manage.  If the compiler gets smart enough to figure it out on its own, will we even need the restrict tag?

    How does this scale across hardware?  Let's say I write some code that is designed (and tagged) to be run on a GPU or other parallel implementation.  If that hardware is then not available at runtime, can it drop back to a pure CPU based implementation (with a big loss of speed of course), or will the program just fail to run?  There has to be a fall-back strategy to this, or it will be a distribution nightmare.  Compiled into native code for a particular hardware set, we will be back to an environment where binaries will have to be created for every hardware combination customers might have -- not a manageable process.  This seems like a situation that screams out for a JIT to target specifically to the hardware configuration of the current platform.  Probably this has all been worked out and I just missed something in the presentation.

    This is an open standard, but is based on support for DX11 DirectCompute.  I wonder if this will [eventually] carry over to platforms that don't support DirectCompute (say OpenCL maybe)?

    Will we see this open standard work it's way into the next C++ language standard? (assuming mostly as an officially-spec'd library more than a language extension)


  • User profile image

    This looks pretty cool.  I see that it supports n-dimensional data and GPU builtin functions, so that answers two of my questions.

    The C++ extension looks clean enough, I'll be curious to see if GCC and ICC implement it.  Still, at the moment it looks like it's only going to be useful for quickly adding some GPU support to apps targeting Windows Vista and up.  For anything portable, OpenCL will remain king.

    Some questions:

    1. Does it support keeping data in RAM closest to the execution unit?  For example, if I have a multi-pass function like an orthogonal 2D resampler, I want to keep the data in GPU RAM without having it transfer to CPU RAM between passes.
    2. Does it use texture units?  I see a 2D array example in the video, but it's not clear if it uses a texture.  Textures are different in that they are organized in memory as 2D tiles instead of 1D rows -- this improves cache usage when your algorithm has 2D access patterns.
    3. Does it support SIMD and easy shuffles?
    4. Are there any things you can do in GPGPU languages that you can't do in AMP, or that AMP makes more difficult?
    5. How does performance compare to writing directly in a GPGPU language?
  • User profile image

    Must "restrict" targets be built into the compiler, or is it possible for library writers to add their own restrict targets?

  • User profile image
    Steve Miller

    Why do my comments get removed? did I write something wrong? :(

  • User profile image

    @Steve Miller: Your comment are at

    There is two blogs to the same video :(

  • User profile image

    @new2STL: We are working on this. Sorry.


  • User profile image
    Steve Miller

    @new2STL Thanks, did not notice.

    @Charles sorry guys, did not want to blame anyone.

  • User profile image

    Over 800 GFLOPS!  Unbelievable!  Great keynote.  I am really amazed. Big Smile

  • User profile image

    @Richard.Hein: Agreed. Mind-mlowing demo for sure.


  • User profile image


  • User profile image

    We did this (generated GPU code using lambda functions) more than a year ago :)

    We went to NVidia and we went to AMD, and we said: look, we think this is cool, but they were not interested.

    Maybe we should have gone to Microsoft :)

  • User profile image

    Like Herb said; it seems to be the cleanest design, no doubt. I have just one 'issue' with the syntax of the restrict keyword - which I did point out to the PM there at the conference who just would not agree!

    The keyword restrict's sytax is 'alien' to the C++ language. Keywords in the language appear by themselves and no other keyword (to the best of my knowledge) takes arguments that are essentially open-ended strings. Who gets to decide which strings are acceptable and which ones not. How do you coordinate whose implementation is chosen if two separate libraries choose to use the same 'string' in the restrict's argument?

    And finally, the whole point that a certain string becomes a keyword-level value when used inside the brackets of restrict but is a perfectly valid variable name used elsewhere is just not neat.

    Even a restricted list of strings would be bad enough. It leaves open the door to cram stuff into the language and allows for arbitrary extensions to it.

    The syntax is awfully like that of __declspec; and I think __declspec is acceptable; it is explicitly saying it is compiler-vendor specific (use of __), declspec (without __) shouldn't be.

    Also, why couldn't #pragme be used?

    C++ continues to remain one of the best languages, I believe, primarily because of its well thought out additions and resistance to add quick-fix type 'specific' features.

    If the language as a whole needs a facility to do compile-time code injection, or hints to compiler to generate alternate code, then perhaps it needs to be thought out in a broader context and as a generic facility to the language as a whole. After all, there may be several other areas where the ability to tell the compiler to generate a certain-type of code or to generate prologue and epilogue codes could come in handy.

  • User profile image
    Mac RoShaft

    Praise god that Bill Gates is no longer active at Microsoft. I always said that Steve Ballmer is the real brains behind all the innovation at MS. You can see this from new products like Visual Studio 2010, 2010.6, 2010.7, and 2011.

Add Your 2 Cents