Chittur Subbaraman: Inside Windows 7 - Service Controller and Background Processing

How does TPL & PLINQ relate to ConcRT?
The easiest distinction between TPL/PLINQ and the Concurrency Runtime (ConcRT) is the target customer; TPL & PLINQ are built on .NET while ConcRT, the Parallel Pattern Library (PPL) and the Asynchronous Agents library are targeted
to C++ customers. All are available in the Visual Studio 2010 CTP.
Many of the scenarios and use cases between TPL & PPL are very similar particularly at a high level, i.e. both support task parallelism, parallel loops and have well defined cancellation and exception handling support. The runtimes are different; while TPL
and PLINQ are built on top of the CLR and it's threadpool, PPL and Agents are built on the Concurrency Runtime which is a component of the C Runtime that is new to Visual Studio 2010.
As you've noted, Joe, Steve, myself and the rest of the TPL, PLINQ and ConcRT "folks" are all on the Parallel Computing team, we talk very frequently and are incredibly cognizant about the places where the technologies and APIs have differences; we try to ensure
that the usage and semantics are similar wherever possible to minimize the amount of time spent by you (our developers) keeping track of idiosyncrasies that aren’t inherent to the .NET & C++ programming model differences.
-Rick
The ring crossing overhead is an important consideration, because in the fine-grain, over-decomposed, task-based, concurrent execution world of ConcRT – the overheads can significantly limit just how fine-grained tasks can be.
Reducing the cost of ring crossing (the kernel/user boundary) is something I would very much like to see. Despite many improvements, it is still a very significant overhead, and hopefully someday will be much less than it is today. But even if the
ring crossing was free, there is an inherent advantage to UMS in that the scheduling decisions are made in the run-time rather than the kernel. This allows the run-time (e.g. ConcRT) a great amount of flexibility in terms of how it optimizes its use of the
CPUs. Sometimes it is suggested that instead of user-mode scheduling, what is needed is a pluggable kernel scheduler. But user-mode scheduling has two great advantages over that approach. First it has access to whatever great wealth of metadata about the
computation that the compiler has made available in the program, while the kernel has a much more limited/expensive interface to user-mode state. Second, if the user-mode scheduler screws up it only crashes/hangs the app, not the system.
Process creation on Windows is more expensive compared to UNIX. This is because it doesn’t have to be as cheap. Unlike traditional UNIX, the Windows thread represents scheduling of the CPU. We keep threads pretty cheap in Windows, but have loaded up process
creation with a lot of functionality (including stuff like shims for broken apps and implementation of the subsystem model). Process launch is generally synonymous with application launch on Windows (especially on client systems). App launch is relatively
rare (generally a user has to click something), but thread creation is very common. So the system is optimized for threads (including facilities like the Win32 thread pool, which allows rampant thread re-use to amortize the creation overhead and reduce application
memory requirements).
This doesn’t mean that I wouldn’t like for us to make improvements in process creation on aesthetic grounds. But it isn’t a problem in a practical sense, so it is always down the list.
Linux is somewhat different than traditional UNIX, or even a more modern UNIX like Solaris (which has real threads). But my knowledge of Linux is limited, so I won’t try to explain how Linux uses a form of a process as a thread as I will get pieces
of it wrong. UNIX and NT (aka modern Windows) were designed at different times for different environments with different goals, so when they run on the same environment they often take very different points of view, and so direct comparisons can be misleading.
Thanks Dave!!
C
Does ConcRT work like SLI, you multiplex CPU's to make it look like a single faster CPU?
Are there any plans to expose the UMS to .NET?
What's the reasoning behind not supporting UMS in 32 bit windows?
(as stated at http://msdn.microsoft.com/en-us/library/dd627187(VS.85).aspx)
What is the services work mentioned when Dave is talking about procrastination?
Hi, I'm trying to make ums work, but have problems I can't resolve myself.
Look here for more details:
https://channel9.msdn.com/forums/TechOff/545224-Need-help-making-simplest-UMS-scheduler-work/
Thanks.
Can you elaborate here? UMS does not really provide a public API... You use ConcRT as the abstraction layer for native task-based concurrent programming on Windows.
I've alerted the right folks to take a look at the problem you're running into (based on your code sample in the MSDN forums, which is where I pointed them to....). Again, to be clear, ConcRT is supposed to be the proxy you play with to get UMS goodness...
C