Visualizing Concurrency: VS 2010 Beta 2 - Parallel Performance Profiling Advancements

Play Visualizing Concurrency: VS 2010 Beta 2 - Parallel Performance Profiling Advancements
Sign in to queue


In Visual Studio 2010 Beta 1, you were introduced to new analysis and profiling capabilities (Parallel Profiling and Performance Tools) designed to make concurrency understandable and, ultimately, debuggable. Today, with the release of Visual Studio 2010 Beta 2, we introduce an updated and significantly more capable concurrency visualization and profiling tool which is available with other profiling features in Visual Studio 2010 Premium and Ultimate. What does it do, exactly? How does it work?
What's new?

Here, Architect Hazim Shafi, Dev Lead Sasha Dadiomov and PM Bill Colburn tell us all about the Concurrency Visualizer Profiling Tool, including a demo. So, fire up Beta 2, spin up some threads and visualize concurrency. You should profile an already-existing application that employs concurrency and, perhaps for the first time, get to see what your concurrent code is actually doing at run time.

Parallel visualization tools team blog:

The parallel computing dev center:

Hazim's blog:



Download this episode

The Discussion

  • User profile image

    Witch edition do I need to get access to this?

  • User profile image

    This is available in Premium and up.


  • User profile image

    What's "Premium?" Is that the equivalent of VS 2008 Pro?

  • User profile image
  • User profile image

    What version of Windows does this feature require?


    On my Server 2003 R2, it says "Requires infrastructure not available on this version of Windows" Sad

  • User profile image
  • User profile image

    Hi, sorry, but this is the least informative webcast I've seen on "going deep" so far, and I watched quite a few. And it's really surprising since you have 3 developers in the room with the camera and they keep talking marketing... 


    First off, all this functionality existed for years in Intel Thread Profiler for native applications. Additionally Thread Profiler is displaying transitions (transition from thread1 to thread2 is when thread1 leaves a critical section and thread2 acquires access to it by acquiring the synchronization primitive). From what I understood VS 2010 shows you that the thread was idle/waiting on a sync primitive, but it does not show you which thread needed to release the mutex, so the current one can advance. Another feature, or whole analysis engine is a Critical Path analysis. Which, as I understood it, is also missing from VS 2010 profiler.


    From the "going deep" host I was expecting questions like:


    • how large is the sampling, instrumentation and tracing overhead? Are there cases, where it skews up application behavior and how would you fix that?
    • Why sample on context switches only? Why not use time-based sampling and sample say, every 10 ms? Time quanta for the thread is quite large ~20ms and from what I understood the callstack is only collected on each context switch event. So really the tool does not tell you what was going on _inside_ the quanta, when the thread was really working. BTW what if a thread is doing some CPU intensive work and there's no over-subscription. Thread Scheduler will keep this thread running for as long as it can without any context switches and therefore no callstacks.
    • Is there support for new and cool threading features of VS 2010  - Asynchronous Agents, task-based parallelism with PPL and TPL? And why not?  Wink
    • And the little things like: why do I need to scroll down for active legend, if it's so informative, important and interactive? if you know that everyone will want to look at the graphical timeline after data collection, why that checkbox is not "on" by default? and so on...

    Story about helping codec people was a lot of fun! I might be missing something, but it sounded like codec developers could not just figure out to actually time their "encode" and "decode" functions running on a stream/image loaded in memory, then take the inverse and guess the only reasonable explanation for the difference between 90 and 24 FPS. But rather decided to substitute thinking with a GUI tool. Cool Smiley


  • User profile image

    Wow. Thanks for the detailed feedback. Smiley This was an introductory piece, to be fair.

  • User profile image

    Great! Looking forward to hearing more about the Profiler, it really did look like a good starting point. I do realize that half of my questions can be answered with "well, this is the ETW limitation/purpose", but still I was wondering if there's a plan to implement EBS or TBS to provide answers to some more complex questions that arise during performance analysis. Support for TPL and PPL is something I'd very much like to see implemented.

  • User profile image

    First, let me say that I will follow up with a more detailed walkthrough of the product asap, so keep an eye out for it.  Let me address some of your questions:


    1. Profiling Overhead:  This is of course application dependent and also platform dependent.  For CPU-intensive phases of your application the overhead is negligible.  Because tracing involves I/O, it can interfere with your I/O intensive applications. On some platforms, collecting callstacks is more expensive (e.g., x64 vs. x86).  That's due to the calling conventions that are being used and the information necessary to walk the stack.  I don't know of any profiling tool with zero impact, but this has thus far not been a source of feedback from customers.  Do you have data to the contrary?


    2. We actually sample on both context switches and at regular time intervals (1ms).  So, we provide you with data about why threads blocked and where as well as data about what threads are doing when they're executing.  You can get at the sample profile data by clicking on the "Execution" legend entry or by clicking on the execution segments in the time line.  When you do the latter, we show you the sample callstack and give you a visual hint to where that sample was taken.  Does this help?


    3. We do have some support for PLINQ, TPL, and PPL.  We show markers for PLINQ queries and some PPL and TPL parallel constructs that allow users to identify the region when they are executing so you can focus your tuning on them.  Try it out!  For PPL, you have to opt into this feature by calling the Concurrency::EnableTracing()/DisableTracing() methods.


    4. I had a hard time parsing your comment about the active legend and the checkbox.  Can you elaborate more?  If this is a usability related question, I am very well aware of some of the warts in the product.  We spent a huge amount of time improving the tool from that perspective.  Look at our CTP, Beta 1, and Beta 2 bits and you'll agree that we've come a long way.  We've also made significant investments in usability studies and worked with designers.  We've learned a lot during this process and hope that we can avoid some of the pitfalls in future releases.


    For more information about the tool, please visit my blog at  You might find some useful information there.  I also have a detailed article in MSDN magazine at


    Finally, Intel's Thread Profiler is very different from our tool in both methodology and diagnostic information provided in addition to our tool being fully integrated with the development experience.  I don't want to get into a competitive analysis here, so choose what you find useful for your needs.











  • User profile image

    Hi, thanks so much for all the info and especially the link to the paper on MSDN, this is exactly the level of details I was looking for. Had a couple of follow up questions after I read it.

    • About transitions (lines that connect the blocking segment with an execution segment on another thread) in the paper you say "When this visualization is visible, it illustrates ...". I was wondering why would it not be visible? Did you refer to the cases of uncontended critical section or is there more to it?
    • Regarding PPL support. In case of nested parallel_for-s or in case of two master threads start two parallel_for algorithms in parallel would it be possible to recognize on the timeline which thread is executing which parallel_for exactly? Or would I only see markers from one parallel_for, or inner-most parallel_for-s?
    • Sorry, my comment about a checkbox was on usability and it refers to the bottom checkbox on Page 1 of 3 of the Performance Wizard (Figure 8 in your paper). In the demo you showed it was not "on" by default and I was wondering why would one run the Concurrency analysis from the VS 2010 GUI if not to see the visualized timeline.
    Anyway, the Profiler really looks like a great Tool, I'm definitely giving it a try!


  • User profile image
    James Rapp

    Hi apegushi,


    • "Thread transitions" are depicted via the Thread Ready Connector.  This is only shown when an unblocking event occured on another thread in same process, which is why it isn't always visible.
    • For PPL support, the Concurrency Visualizer does not depict nesting, nor does it depict which threads were involved in a parallel for loop.  However, markers will be shown for all parallel loop iterations.
    • Regarding the check boxes in the performance wizard, the reason the lower check box isn't on by default is because there is another profiling tool related to concurrency.  In addition to the Concurrency Visualizer data, the Visual Studio profiler presents contention data, which can be viewed after checking the first box.

    Thanks for your feedback; we always appreciate it!



Add Your 2 Cents