Loading user information from Channel 9

Something went wrong getting user information from Channel 9

Latest Achievement:

Loading user information from MSDN

Something went wrong getting user information from MSDN

Visual Studio Achievements

Latest Achievement:

Loading Visual Studio Achievements

Something went wrong getting the Visual Studio Achievements

VS2010 Parallel Computing Features Tour

29 minutes, 4 seconds


Right click “Save as…”

  • WMV (WMV Video)
  • MP3 (Audio only)
  • Low Quality MP4 (approx. 500-800kbps)

Author: Hi, I am Daniel Moth Smiley



In Visual Studio 2010, the Parallel Computing team has delivered APIs and tools for developers wanting to build applications that take advantage of multiple cores. This video provides a glimpse on the managed APIs, debugging windows and profiler support.


For more on the managed APIs, please start on the team's blog. For more on profiler start on that team's blog. For more on Parallel Tasks and Parallel Stacks please start on my blog post on Parallel Debugging.


Follow the discussion

  • Oops, something didn't work.

    Getting subscription
    Subscribe to this conversation
  • Very nice.

    One thing though, when matrix multiplication was executed on a small data set, sequential was actually faster than ParallelFor (@20:48). Are there any insights on estimating an overhead of setting up parallel execution machinery, so, application could attempt guessing whether to process data set sequentially or in-parallel (assuming that application knows the size of the data set)?


  • Witch edition do I need to get access to the parallel debugging?

  • We haven't published any insights. Currently, you would have to measure in yoru app and decide at what kind of workload you are getting speedup from parallelism. You could then have a conditional path that is serial for smaller workloads and a parallel path for larger workloads.




  • Parallel Tasks and Parallel Stacks are available in Pro and above. Also click the parallel debugging link above to watch dedicated screencasts to those windows.





  • Do the results change significantly if you change the order of execution?

  • As the narrator mentions, the Debug.Writeline causes the threads to queue/block on writing to the log/IDE. If this operation is significantly more costly than the work in the task (he has some simple math) then you're running essentially synchronously and the parallel code just becomes a burden.


    Also, I always test by running the methods thousands or even millions of times and taking an average. OS noise amongst other things can severely impact a single pass of a method so you can sometimes get a misleading result.



    Edit -- the tools far exceed all expectations. I'd like to see in-code-editor warnings about compiler optimisation and reordering pitfalls; for example, if I add an attribute [MultiThreaded] to a method, then (after compilation) the IDE highlights execution/reads/writes that has moved from the order it was coded.

  • Not significantly. The point of the video was to show the tools and make you aware of the rich Task API and the ease of us of PFor as well as to show the pitfalls for no perf gains of directly using the ThreadPool. Even though I am microbenchmarking, if you run each approach separate to each other, multiple times and with varying workloads, I do not expect the results will vary significantly.




  • Thanks. Regarding your wish list, those items are not under what we consider debugging and performance tools. We think of those as Correctness tools (or some say Analysis tools and other say Safety tools) - regardless, they are very important too and things we are thinking about for future (post-VS2010) releases.




  • Luke, for the sake of correctness, Debug.Writeline call was added to the MulTask() method, which was executed after the sequential and PFor multiplication. So, PFor loop ran without any blocking on screen output, and it was slower than sequential presumably because of all the extra work associated with priming up parallel execution environment.

    If 90% of the input for my app on any given day happens to be small, it's better to process those 90% sequentially and use parallel execution only when appropriate, but for that it would be nice to know, where (approximately) is that cutting point.

    I can run tests and collect some stats on what overhead of firing up parallel execution is, but assuming that this work might have been already done while developing the parallel framework, it would be preferrable for me to look at the stats collected by the PF development team than spend time and efforts myself.




  • You mention that Parallel.For utilizes tasks, but always has better performance than tasks. How is this possible?

  • I can't remember my exact wording but the precise statement is that "for this benchmark, on my runs the PFor beats my naive tasks implementation <insert more disclaimers here>".


    To actually answer your question: PFor uses tasks in an intelligent manner (partitioning the range amongst a much smaller number of tasks) instead of using one task per iteration. So, of course you could code with tasks yourself an equally (or even more performant version), but look at the simplicity of the PFor API.


    TIP: To get an insight into the PFor implementation place a breakpoint in the body of the for loop and look at the Parallel Tasks window.




  • Seva. You are correct, my apologies. Although the problem with using PF dev team stats is that while they may know that cranking up the other threads takes, say, 2,000 cycles per thread (I've really no idea - could be 20,000), you'd still have to test your task/code to see how much work it is in comparison.


    Although if the profiler could try P.For and normal For and suggest when you're actually degrading performance, that'd be excellent. That said, I did think that P.For uses some kind of ramping up technology so it ran synchronously for small loads - and thus, what we saw in the video shouldn't happen. P.For is so easy to use but so easy to use in the wrong place.


    My problem is that most of the code I write I think is too small to be parallelised. Like arrays of 12 items. By the time I've finished I've got a huge chain of little bits of work adding up to a large amount of work, so then I start to look for logic that can run independently of each other and kick off some branch of execution on another thread and re-join/wait further down. The problem being that in the future I won't be able to find enough simultaneous work to spread across all cores!


    Daniel - thanks, that'd be cool.

  • Which edition exposes those amazing profiling capabilities?

  • Glad you like them.


    1. The APIs ship as part of the .NET 4 framework (and the equivalent native ones as part of VC++ 10), so available in all VS editions.

    2. Debugger Windows in VS Pro and higher.

    3. Profiler in VS Premium and higher. It also requires Vista and higher due to dependency on ETW events.




  • Hi,


    juts of curiosity: What is that vercy nice font that you are using in the VS code editor ?

  • Hi objectref


    I don't think I changed the default (Consolas). Maybe the resolution is making it look nicer than usual...




  • Wonderful video.  Thanks much.  Where can I find the source code for the example project?

  • Hi Kevin


    Thanks, I uploaded the *demo* code to on my blog:





  • jmjm

    viven del pueblo y traicinona al pueblo
    no tiene cara para salir a la calle

  • PhillipPhillip

    Great demonstration. It makes me feel really weird to see a variable named "_".

Remove this comment

Remove this thread


Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.