C++ Accelerated Massive Parallelism in Visual C++ 2012

Download this episode

Download Video

Description

Did you know that most of the computers on which you deploy applications have more power in the GPU on the video card than in the CPU, even multi-core machines? Harnessing the power of the GPU is the next step in the manycore/multicore revolution and can mean astonishing improvements in execution time. Depending on how data parallel your calculations are, you might see a speedup of 5, 10, or even 50 times! Imagine a calculation that takes 24 hours today completing in half an hour instead. What new capabilities would that enable for your users? Until recently, running code on the GPU has meant using one of several "C-like" languages. The upcoming release of C++ Accelerated Massive Parallelism (AMP) means that you can use accelerators like the GPU from native C++. Visual Studio includes debugging and profiling support for C++ AMP, and you don't need to download or install any new libraries to accelerate your code. In this session, see the power of C++ AMP and learn the basic concepts you need to adapt your code to use this massive parallelism.

For more information, check out this course on Microsoft Virtual Academy:

Day:

3

Code:

DEV334

Embed

Format

Available formats for this video:

Actual format may change based on video formats available and browser capability.

    The Discussion

    • User profile image
      tomkirbygre​en

      Excellent session! High resolution MP4 please! Studying content that features text in low resolution is a recipe for headaches.

    • User profile image
      Kate Gregory

      The slides are there for downloading ... I often download slides and look at them while listening to the audio for easier reading. My matrix multiply code will be available for download at some point, but in the meantime http://blogs.msdn.com/b/nativeconcurrency/archive/2011/11/02/matrix-multiplication-sample.aspx will make a good substitute. Mine just has timing code. Thanks for watching!

    • User profile image
      Matt_PD

      Kate: I have a question about the call to member-function "synchronize" with the goal of including the copy-out time when measuring total execution time for benchmarking purposes.

      I understand this is recommended due to asynchronicity of "parallel_for_each" and the associated copy-(only-)on-demand optimization of the captured concurrency::array changed in the lambda passed to "parallel_for_each" (which could prevent a copy-out from occurring in the benchmarked execution path).

      I'm wondering, would a deep copy operation (from a GPU-bound array to a CPU-bound vector) called before stopping the timer also count? As in, for instance:

      std::vector<double> CPU_V;
      concurrency::array<double> GPU_V;
      // ...
      CPU_V = GPU_V;

      A sub-question, just to make sure I understand this correctly -- I assume the above call to the assignment operator invokes a (synchronous) copy (as opposed to copy_async) due to having to go via the result of the following implicit conversion operator present in "amp.h" (in the definition of the "concurrency::array" class template):

          /// <summary>
          ///     Implicitly converts this array into a vector by copying.
          /// </summary>
          operator std::vector<_Value_type>() const __CPU_ONLY
          {
              std::vector<_Value_type> _return_vector(extent.size());
              Concurrency::copy(*this, _return_vector.begin());
              
              return _return_vector;
          }


      Is this correct?

    • User profile image
      tomkirbygre​en

      Great session Kate. The thought of using this tech on upcoming Windows tablets for consumer apps is really exciting. 

    • User profile image
      Rishi

      Hi Kate,

      Thanks for your nice talks. Would it be possible for you to post the slides from your C++ precon ?

      Thanks.

    • User profile image
      Rishi

      Hi Kate,

      Thanks for your nice talks. Would it be possible for you to post the slides from your C++ precon ?

      Thanks.

    Comments closed

    Comments have been closed since this content was published more than 30 days ago, but if you'd like to send us feedback you can Contact Us.