Here, we continue our exploration of the morphology of Windows 7 on
Going Deep with windows kernel architect Dave Probert. You may remember him from an early
four part episode of Going Deep where he teaches us about general purpose operating system architectures and history:
Part 1,
Part 2,
Part 3,
Part 4That was a
great conversation from a few years ago and it's been
way too long since we returned to Windows kernel world to converse with and learn from Dr. Probert. Not surprisingly, Dave has been busy innovating the Windows core.
Dave and team, working very closely with the
Parallel Computing Platform People, have created a very compelling new user mode thread scheduling/management system in Windows 7. In a nutshell, the User Mode Scheduler provides a new model for high-performance applications to control the execution of
threads by allowing applications to schedule, throttle and control the overhead due to blocking system calls. In other words, applications can switch user threads
completely in user mode without going through the kernel level scheduler. This frees up the kernel thread scheduler from having to block unnecessarily, which is a very good thing as we move into the age of Many-Core... Speaking of Many-Core, remember
the piece we did on the Concurrency Runtime (ConcRT)?
ConcRT is built on top of UMS and is the best way to most effectively utilize this new user mode thread scheduling model in Windows 7.
Make yourself comfortable and spend some time watching and listening to Dave make all of this crystal clear.
This is another
great conversation with a fantastic OS architect and Windows kernel professor. Lots to learn here. Enjoy.
Follow the Discussion
Oops, something didn't work.
What does this mean?
Following an item on Channel 9 allows you to watch for new content and comments that you are interested in. You need to be signed in to Channel 9 to use this feature.What does this mean?
Following an item on Channel 9 allows you to watch for new content and comments that you are interested in and view them all on your notifications page.sign up for email notifications?
So if I understand this correctly, this User-mode Scheduler feature of Windows 7 is basically about doing what Fibers do currently, but making them look and work like full threads from the user-mode code's point of view.
It's a shame they couldn't figure a way to do user-mode pre-emptive switching. That really would have been killer. Actually, all I'd need to implement something I've been thinking about (a special .NET Virtual machine with super-lightweight user-mode threads) is a way to get the OS to periodically interrupt designated unblocked OS threads and jump (not call) to a pre-set address, having saved off the registers to a pre-set location. I say "all I need...." there's probably a hundred big problems with implementing such a scheme, which is why I'm not a kernel developer and Dave is
>>(a special .NET Virtual machine with super-lightweight user-mode threads)
My god , thats a terrible idea
The original NT kernel was beautiful, a bit slow, but perfect in its original design, light years from unix.. now its a mix of hacks and tricks...
What windows needs is to return to its origins an clean and Inspirated kernel based in very good ideas from VMS....
As for your views on the NT kernel evolution, the thing is speed matters a whole heck more than 'beauty' in the real world. Get used to it.
Ever since you posted the video with Mark Russinovich briefly talking about UMS I've been quite interested in it but couldn't find much detail about it, so this video is a godsend.
Thanks a lot! Keep up the great work Charles and the gang!
Keep on watching,
C
also, joe duffy and the tpl guys are part of the pcp team right? id love to see an interview about the relation between the managed and unmanaged world here
But a constant I've heard including this time, are optimizations to avoid hitting the kernel for the sheer cost of context switching and crossing the Kernel/User boundary, of course undertable the amount of operations that this requires. But c'mon, this isn't a new problem guys like Robert are having this issue for at least 30 years. Besides of GHz on the procesors , what are they doing to ease such switching.
Also Mr. Robert talked about the origin of the process/thread abstraction and I've heard from Unix folks (not only linux bashers) that creating a process on Windows have a bigger impact that on *nix, where process are very ligth. Perhaps Robert can shed some ligth on this.
Don't know, Charles, if this can even be included in an upcoming Going Deep video.
Thanx
How does TPL & PLINQ relate to ConcRT?
The easiest distinction between TPL/PLINQ and the Concurrency Runtime (ConcRT) is the target customer; TPL & PLINQ are built on .NET while ConcRT, the Parallel Pattern Library (PPL) and the Asynchronous Agents library are targeted to C++ customers. All are available in the Visual Studio 2010 CTP.
Many of the scenarios and use cases between TPL & PPL are very similar particularly at a high level, i.e. both support task parallelism, parallel loops and have well defined cancellation and exception handling support. The runtimes are different; while TPL and PLINQ are built on top of the CLR and it's threadpool, PPL and Agents are built on the Concurrency Runtime which is a component of the C Runtime that is new to Visual Studio 2010.
As you've noted, Joe, Steve, myself and the rest of the TPL, PLINQ and ConcRT "folks" are all on the Parallel Computing team, we talk very frequently and are incredibly cognizant about the places where the technologies and APIs have differences; we try to ensure that the usage and semantics are similar wherever possible to minimize the amount of time spent by you (our developers) keeping track of idiosyncrasies that aren’t inherent to the .NET & C++ programming model differences.
-Rick
The ring crossing overhead is an important consideration, because in the fine-grain, over-decomposed, task-based, concurrent execution world of ConcRT – the overheads can significantly limit just how fine-grained tasks can be.
Reducing the cost of ring crossing (the kernel/user boundary) is something I would very much like to see. Despite many improvements, it is still a very significant overhead, and hopefully someday will be much less than it is today. But even if the ring crossing was free, there is an inherent advantage to UMS in that the scheduling decisions are made in the run-time rather than the kernel. This allows the run-time (e.g. ConcRT) a great amount of flexibility in terms of how it optimizes its use of the CPUs. Sometimes it is suggested that instead of user-mode scheduling, what is needed is a pluggable kernel scheduler. But user-mode scheduling has two great advantages over that approach. First it has access to whatever great wealth of metadata about the computation that the compiler has made available in the program, while the kernel has a much more limited/expensive interface to user-mode state. Second, if the user-mode scheduler screws up it only crashes/hangs the app, not the system.
Process creation on Windows is more expensive compared to UNIX. This is because it doesn’t have to be as cheap. Unlike traditional UNIX, the Windows thread represents scheduling of the CPU. We keep threads pretty cheap in Windows, but have loaded up process creation with a lot of functionality (including stuff like shims for broken apps and implementation of the subsystem model). Process launch is generally synonymous with application launch on Windows (especially on client systems). App launch is relatively rare (generally a user has to click something), but thread creation is very common. So the system is optimized for threads (including facilities like the Win32 thread pool, which allows rampant thread re-use to amortize the creation overhead and reduce application memory requirements).
This doesn’t mean that I wouldn’t like for us to make improvements in process creation on aesthetic grounds. But it isn’t a problem in a practical sense, so it is always down the list.
Linux is somewhat different than traditional UNIX, or even a more modern UNIX like Solaris (which has real threads). But my knowledge of Linux is limited, so I won’t try to explain how Linux uses a form of a process as a thread as I will get pieces of it wrong. UNIX and NT (aka modern Windows) were designed at different times for different environments with different goals, so when they run on the same environment they often take very different points of view, and so direct comparisons can be misleading.
Thanks Dave!!
C
Does ConcRT work like SLI, you multiplex CPU's to make it look like a single faster CPU?
Have you watched this? Or this? These should really help you understand. If you'd rather just read, then check this out.
C
Are there any plans to expose the UMS to .NET?
What's the reasoning behind not supporting UMS in 32 bit windows?
(as stated at http://msdn.microsoft.com/en-us/library/dd627187(VS.85).aspx)
What is the services work mentioned when Dave is talking about procrastination?
Hi, I'm trying to make ums work, but have problems I can't resolve myself.
Look here for more details:
http://channel9.msdn.com/forums/TechOff/545224-Need-help-making-simplest-UMS-scheduler-work/
Thanks.
Can you elaborate here? UMS does not really provide a public API... You use ConcRT as the abstraction layer for native task-based concurrent programming on Windows.
I've alerted the right folks to take a look at the problem you're running into (based on your code sample in the MSDN forums, which is where I pointed them to....). Again, to be clear, ConcRT is supposed to be the proxy you play with to get UMS goodness...
C
Remove this comment
Remove this thread
close