I have a question I wish I could've asked for this interview: How is the write-barrier implemented in the CLR?
Patrick explains in the video about the generational aspect of the CLR's collector, but generational collectors require a thing called a write-barrier. This is because you can get away with only scanning gen0 when all the objects in gen0 are only referenced either by roots (references in stack frames or globals) or other gen0 objects. But the problem is that you can create a new object and assign a reference to it to an older object that might be in gen1 or gen2. The write-barrier therefore has to intercept modifications to objects in older generations and store any references to objects in gen0 in another area so they can be used as additional roots in the next gen0 collection.
There are two ways to implement a write-barrier (that I know of anyway). One is to have the compiler (or JIT) generate extra code that checks all writes to reference fields and if they lie outside the range assigned to gen0, to jump into a routing that stores away the reference. The other is to ask the OS to trap writes the memory area allocated to older generations of the heap and have the exception handler deal with it. Both methods have significant pros and cons. I'd love more information on exactly how this is implemented in the CLR.
Not sure Patrick will share specific implementation details of an IP-protected technology. Perhaps this specific topic is OK. I'll ask him....
C
A couple of other comments on this interview:-
There was some discussion about the extra pressure that functional programming places on the garbage collector. Jane Street Capital is a company that uses functional programming extensively (if you google many terms related to functional programming you are likely to see one of their adverts) using the Objective Caml language. They have stated (though I haven't seen detailed testing evidence) that while F# is interesting due to it's similarity to OCaml and interoperability with the .NET ecosystem, it's not performant enough for production use, at least in the way they use OCaml, because the .NET garbage collector isn't tuned to the needs of functional programming.
Also there was some discussion of ignoring the stack and allocating everything on the heap. There have been a couple of attempts at doing this for conventional languages. The most well-known is Stackless Python. That is motivated by the issue that stacks are a huge burden in systems with many many threads. The idea with Stackless Python is to allow you to have thousands of threads and having the OS allocate a separate 1Mb stack for each wouldn't be practical. In this system all stack frames are allocated on a heap and garbage collected when the functions have returned. It is significantly slower, but if you want to write a system (like a simulation) where you have thousands of objects, each executing within their own thread, it's a solution. The most famous user of Stackless Python is the MMO game Eve Online - all the entites within the game world are running in their own threads of control in one giant process.
Also, and more relevant to .NET, Mono (the open source re-implemetation of .NET) has added a feature called Tasklets which bring the same thing to .NET world, at least as far as I understand it.
Well, it's actually a pretty high-level question - just one that you'd only ask if you'd spent a lot of time reading about garbage collection.
FWIW, the code (at least as far as v2 of the CLR is concerned) is most likely in the Rotor source. It's just that being production-ready code rather than an accademic exercise, it's pretty hard to find what you want from it.
Implementation, to me, means specific details of how something is composed. But, enough with semantics...
From Patrick: "We implement it with the compiler emitting a call to helper routines for stores."
Why is it so hard to find? Just search the Rotor source for "WriteBarrier". The JIT inserts calls to some helper functions named something like JIT_WriteBarrier. These functions can be found in a file named jithelp.asm.
And for the record: .NET does the same thing. You can see those helper calls with a native debugger.
Suppose that you could have generational GC exactly like .NET has now, but the execution of write barrier related code was free. How much faster would programs run? I know that this question is highly program dependent, but maybe you can give an order of magnitude guess.
I find it quite intriguing that there is a connection between the CLR and SQL Server because they are both runtimes that execute a language (MSIL and T-SQL). And the connection between functional vs imperative style programming and updates. Hm, looking forward to hearing more thoughts on this.
Patrick mentioned that you can measure GC duration.
How does the CLR team measure this? I am only aware of 2 ways (ETW and % time in GC performance counter).
Are there stats available for GC duration for the different flavors of GC (Server, Concurrent...)?