C++ Accelerated Massive Parallelism in Visual C++ 2012
- Date: June 13, 2012 from 5:00PM to 6:15PM
- Day 3
- DEV334
- Speakers: Kate Gregory
- 2,739 Views
- 6 Comments
Loading User Information from Channel 9
Something went wrong getting user information from Channel 9
Loading User Information from MSDN
Something went wrong getting user information from MSDN
Loading Visual Studio Achievements
Something went wrong getting the Visual Studio Achievements
Right click “Save as…”
Slides (view online)Did you know that most of the computers on which you deploy applications have more power in the GPU on the video card than in the CPU, even multi-core machines? Harnessing the power of the GPU is the next step in the manycore/multicore revolution and can mean astonishing improvements in execution time. Depending on how data parallel your calculations are, you might see a speedup of 5, 10, or even 50 times! Imagine a calculation that takes 24 hours today completing in half an hour instead. What new capabilities would that enable for your users? Until recently, running code on the GPU has meant using one of several "C-like" languages. The upcoming release of C++ Accelerated Massive Parallelism (AMP) means that you can use accelerators like the GPU from native C++. Visual Studio includes debugging and profiling support for C++ AMP, and you don't need to download or install any new libraries to accelerate your code. In this session, see the power of C++ AMP and learn the basic concepts you need to adapt your code to use this massive parallelism.
Already have a Channel 9 account? Please sign in
Follow the Discussion
Excellent session! High resolution MP4 please! Studying content that features text in low resolution is a recipe for headaches.
The slides are there for downloading ... I often download slides and look at them while listening to the audio for easier reading. My matrix multiply code will be available for download at some point, but in the meantime http://blogs.msdn.com/b/nativeconcurrency/archive/2011/11/02/matrix-multiplication-sample.aspx will make a good substitute. Mine just has timing code. Thanks for watching!
Kate: I have a question about the call to member-function "synchronize" with the goal of including the copy-out time when measuring total execution time for benchmarking purposes.
I understand this is recommended due to asynchronicity of "parallel_for_each" and the associated copy-(only-)on-demand optimization of the captured concurrency::array changed in the lambda passed to "parallel_for_each" (which could prevent a copy-out from occurring in the benchmarked execution path).
I'm wondering, would a deep copy operation (from a GPU-bound array to a CPU-bound vector) called before stopping the timer also count? As in, for instance:
std::vector<double> CPU_V;
concurrency::array<double> GPU_V;
// ...
CPU_V = GPU_V;
A sub-question, just to make sure I understand this correctly -- I assume the above call to the assignment operator invokes a (synchronous) copy (as opposed to copy_async) due to having to go via the result of the following implicit conversion operator present in "amp.h" (in the definition of the "concurrency::array" class template):
/// <summary>
/// Implicitly converts this array into a vector by copying.
/// </summary>
operator std::vector<_Value_type>() const __CPU_ONLY
{
std::vector<_Value_type> _return_vector(extent.size());
Concurrency::copy(*this, _return_vector.begin());
return _return_vector;
}
Is this correct?
Great session Kate. The thought of using this tech on upcoming Windows tablets for consumer apps is really exciting.
Hi Kate,
Thanks for your nice talks. Would it be possible for you to post the slides from your C++ precon ?
Thanks.
Hi Kate,
Thanks for your nice talks. Would it be possible for you to post the slides from your C++ precon ?
Thanks.
Remove this comment
Remove this thread
close