Anton Pegushin

Anton Pegushin apegushi

Niner since 2010

Senior Technical Consulting Engineer at Intel


  • Visualizing Concurrency: VS 2010 Beta 2 - Parallel Performance Profiling Advancements

    Hi, thanks so much for all the info and especially the link to the paper on MSDN, this is exactly the level of details I was looking for. Had a couple of follow up questions after I read it.

    • About transitions (lines that connect the blocking segment with an execution segment on another thread) in the paper you say "When this visualization is visible, it illustrates ...". I was wondering why would it not be visible? Did you refer to the cases of uncontended critical section or is there more to it?
    • Regarding PPL support. In case of nested parallel_for-s or in case of two master threads start two parallel_for algorithms in parallel would it be possible to recognize on the timeline which thread is executing which parallel_for exactly? Or would I only see markers from one parallel_for, or inner-most parallel_for-s?
    • Sorry, my comment about a checkbox was on usability and it refers to the bottom checkbox on Page 1 of 3 of the Performance Wizard (Figure 8 in your paper). In the demo you showed it was not "on" by default and I was wondering why would one run the Concurrency analysis from the VS 2010 GUI if not to see the visualized timeline.
    Anyway, the Profiler really looks like a great Tool, I'm definitely giving it a try!


  • Visualizing Concurrency: VS 2010 Beta 2 - Parallel Performance Profiling Advancements

    Great! Looking forward to hearing more about the Profiler, it really did look like a good starting point. I do realize that half of my questions can be answered with "well, this is the ETW limitation/purpose", but still I was wondering if there's a plan to implement EBS or TBS to provide answers to some more complex questions that arise during performance analysis. Support for TPL and PPL is something I'd very much like to see implemented.

  • Visualizing Concurrency: VS 2010 Beta 2 - Parallel Performance Profiling Advancements

    Hi, sorry, but this is the least informative webcast I've seen on "going deep" so far, and I watched quite a few. And it's really surprising since you have 3 developers in the room with the camera and they keep talking marketing... 


    First off, all this functionality existed for years in Intel Thread Profiler for native applications. Additionally Thread Profiler is displaying transitions (transition from thread1 to thread2 is when thread1 leaves a critical section and thread2 acquires access to it by acquiring the synchronization primitive). From what I understood VS 2010 shows you that the thread was idle/waiting on a sync primitive, but it does not show you which thread needed to release the mutex, so the current one can advance. Another feature, or whole analysis engine is a Critical Path analysis. Which, as I understood it, is also missing from VS 2010 profiler.


    From the "going deep" host I was expecting questions like:


    • how large is the sampling, instrumentation and tracing overhead? Are there cases, where it skews up application behavior and how would you fix that?
    • Why sample on context switches only? Why not use time-based sampling and sample say, every 10 ms? Time quanta for the thread is quite large ~20ms and from what I understood the callstack is only collected on each context switch event. So really the tool does not tell you what was going on _inside_ the quanta, when the thread was really working. BTW what if a thread is doing some CPU intensive work and there's no over-subscription. Thread Scheduler will keep this thread running for as long as it can without any context switches and therefore no callstacks.
    • Is there support for new and cool threading features of VS 2010  - Asynchronous Agents, task-based parallelism with PPL and TPL? And why not?  Wink
    • And the little things like: why do I need to scroll down for active legend, if it's so informative, important and interactive? if you know that everyone will want to look at the graphical timeline after data collection, why that checkbox is not "on" by default? and so on...

    Story about helping codec people was a lot of fun! I might be missing something, but it sounded like codec developers could not just figure out to actually time their "encode" and "decode" functions running on a stream/image loaded in memory, then take the inverse and guess the only reasonable explanation for the difference between 90 and 24 FPS. But rather decided to substitute thinking with a GUI tool. Cool Smiley