Entries:
Comments:
Posts:

Loading User Information from Channel 9

Something went wrong getting user information from Channel 9

Latest Achievement:

Loading User Information from MSDN

Something went wrong getting user information from MSDN

Visual Studio Achievements

Latest Achievement:

Loading Visual Studio Achievements

Something went wrong getting the Visual Studio Achievements

Arun Kishan: Inside Windows 7 - Farewell to the Windows Kernel Dispatcher Lock

Download

Right click “Save as…”

You've learned about many of the new features of the latest version of the Windows kernel in the Mark Russinovich Inside Windows 7 conversation here on Channel 9. One of Mark’s favorite kernel innovations is the way the Windows 7 kernel manages scheduling of threads and the underlying synchronization primitives that embody kernel thread management.

Prior to Windows 7 (and therefore Windows Server 2008 R2) the Windows kernel dispatcher employed a single lock, the dispatcher lock, which worked well for a relatively small numbers of processors (like 64). However, now that we find ourselves in the midst of the ManyCore era, well, 64 processors aren’t that many... A new strategy was required to scale Windows to large numbers of processors since a single lock is limited in capability, by design: The masterful David Cutler, one of the world's greatest software engineers, wrote the NT scheduler in a time when the notion of affordable 256-processor machines was more science fiction than probable. 

As we learned in the Mark Russinovich video, Windows 7 can now scale to 256 processors thanks to the great engineering of Arun Kishan, a kernel architect you've met on C9 back in the Vista days. In order to promote further scalability of the NT kernel, Arun completely eliminated the dispatcher lock and replaced it with a much finer grained set of synchronization primitives. Gone are the days of contention for a single spinlock. How did Arun pull this off, exactly, you ask? Who is this genius? Well, tune in. Lots of answers await…

Arun's work directly benefits the overall performance of Windows running on many processors and means, simply, Windows can now really scale. Thank you, Arun!

 


Spinlocks are synchronization primitives that cause a processor to busy-wait until the state of the lock’s memory location changes.

 
As the name implies, the dispatcher lock is the fundamental lock associated with the kernel dispatcher, or the scheduler.

 

Tags:

Follow the Discussion

  • Vesuviusvesuvius Count Orlock

    The interview I have been dreaming of, and about, and over and...

     

    I really do mean that!

  • CharlesCharles Welcome Change

    Smiley

     

    Enjoy! What Arun accomplished really is amazing. I'm just blown away by the elegance of his solution and the engineering strategies he employed to pull it off (you'll learn about those towards the end of the conversation).

     

    C

  • stevo_stevo_ Human after all

    Blimey, you've certainly got to have had your head in the kernel for awhile to really keep up with that, but by the end I understood the general concept, and it sounded impressive Wink.

  • Truly amazing! Really interesting stuff, thanks Charles.

    This is fascinating to see how, with such talented people, a kernel designed more than decade ago can be enhanced to suit today's needs. This is why I love Operating System design and programming. There are such great foundations and languages to build upon but also so many improvements to do. 

  • Allan LindqvistaL_ Kinect ftw

    wow, video start-to-whiteboard (STW) in 33 seconds, good stuff Smiley

  • MaidenDotNetMaidenDotNet Who Dares Wins...

    I am a C# Developer who has always worked with Locking, Threads, etc through .NET and never through C++ libraries.  Even though that is the case, would it not help optimization to have a mechanism for the developer to suggest/hint to the OS an okay timeframe for a long running Wait to be loaded back from Paged Memory?  In other words, a developer may know it is never important for a certain Thread to be running again for let us say 15 days.  Consequently, she/he can add a TimeSpan argument to his Thread Function/Method that gives a hint to OS that while it might be less than 15 days that Thread says it now wants to run because Wait was satisfied or whatever; he is okay with it taking up to 15 days.  Then, the OS could decide based on OS Processor's resources to not have to pay any attention to this Job/Thread running again in Non-Paged Memory if 15 days has transpired.  If OS Resources are at a very low rate of utilization and there is plenty to go around for all Threads/Processes, it could then check this Thread/Job to see if it wants to run sooner than 15 days.

     

    Does this make sense?  If it does, is this already built into Windows 7 or even Vista/XP?  The basic thing I am trying to say is to give the OS a shortcut to skip over Thread Wait checks if resources are very low.  In other words, the Threads with hints like those that I mention above could be set aside completely in Paged Memory or even to Disk if there were many demands on resources and not even have to be checked to see if they need to run if need be.  Then, when resources were plentiful, they could be checked to see if they might want to be run and initiated at maybe a time where OS is, running yet User(s) are in bed, not at Server/Computer, etc.

     

    It would be analogous to a Doctor triage where the Doctor could say, Yes, No, Revisit in 2 hours if you have time yet do not even think about this guy/gal unless you have nothing else to do.

  • Hi MaidenDotNet,

    The OS will already page out your stacks/process after some time of inactivity.  I believe a thread becomes a candidate after about 4 seconds. Once the pages become a candidate for theft, then it's only a matter of time and memory pressure when the Memory Manager will rip them away and use them for something else. 

     

    Basically, you don't need to give a 'hint' to the OS since it will do what you want on its own.  On the other hand, if you want the thread to run quickly when it gets signaled, you may be in trouble because of this behavior.  If the pages make it to the disk, it could take about 10-15 milliseconds between when your thread is activated and when it can run (this is the typical seek time of a laptop disk). If the disk is already busy, it could take even longer.

  • Wow, Its just amazing. The solution and strategies are really impressive, n i hope dt would be scalable as well for upcoming core technologies

  • omkar Komkar K Programmer

    Really nice enhance ment in the code ....

    Phenomenal ...

  • Excellent work

     

    We can now expect windows to really scale

     

    Keep it up

     

  • Fascinating.  I would be interested to know what Dave Cutler's take on all this was.  Was he involved in any of the early discussions?  Did Arun formulate the solution first and take it to him (formally or informally)?  Did he say "wow, great idea!" or perhaps "nice try Rookie, but you forgot to assert the make-it-work bit on line 24"...or (perish the thought) did he get all defensive about his baby and start mumbling about Reagan-era priorities... Smiley  This interview is the very essence of why Channel 9 rocks...

  • CharlesCharles Welcome Change

    Glad you enjoyed this! Yes, Cutler was briefed and blessed the change. After all, it was his code/design that Arun replaced Smiley Arun is a genius.

     

    C

  • DomDom

    I my company we develop light simulation software based on the physics of the photon interaction with the matter. We use itensively multithreading.
    I did some performance tests to compare Vista and Seven on a 16 core, dual boot Vista/Seven.
    The result is a consternation :
    - same 16 threads using TLS : 18% slower on Seven
    - same 16 threads using std:map per thread data : 2x slower on Seven
    - same 16 threads using only local variable on stack and only doing only math operations : 38% slower on Seven.
    Moreover : on Vista all threads finishes nearly at the same time, it is no more the case on Seven. On Seven the threads finish one after each other (after a long time), and when only 1 thread remains, it switches from 1 core to another one randomly (I do not set the affinities in this test).
    I am continuing my tests with critical sections, interlock instructions...

Remove this comment

Remove this thread

close

Comments Closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums,
or Contact Us and let us know.