Also I call shenanigans:

If this is, say, parsing HTTP headers on a heavily loaded server, you bet it's going to make a noticeable difference.

Not unless your GC sucks a55 it won't. I've ran servers on microchips that eat strings for breakfast. If he's struggling with strings causing too many collections his code is already so far past wrong that it's unreal. And servers are always a bad example for performance junkies to talk about. Network latency always dwarfs the cost of the GC, and are a great thing for attackers to attack and suffer really badly from memory fragmentation. So they are in fact ideal candidates to be made into managed code with a full blown GC behind them.

In fact, the .NET GC is specifically designed to cope with large numbers of collections of dead small objects. And whilst he says gen-zero collections aren't free, he's missing the point that the cost is proportional to the number of live objects, not the number of dead ones, and so they're a whole lot freer than he thinks they are.

In fact, GC's gcalloc is faster than malloc, calloc and new in C++, and in the case where they don't trigger a full collect, are only marginally more expensive than a stack allocate, mainly due to the fact that it'll call a constructor and clear the contents of your memory

Crank up .NET's XmlReader and profile loading a modest XML document. You'll be surprised to see that allocations during parsing add up to approximately 4X the document's size. Many of these are strings. How did we end up in such a place? Presumably because whoever wrote these abstractions fell trap to the fallacy that "gen0 collections are free." But also because layers upon layers of such things lie beneath.

I think, actually, that Joe has fallen into the fallacy that XML is a suitable solution for a high performance project. If you're storing your data as XML, you better suck up the fact that your data is in a human readable rather than a machine efficient storage format and that getting data out of it probably shouldn't be in a hot loop (and if it's not in a hot loop, why do you care about its performance?)

The writer of the XmlReader probably realized this, and therefore (correctly) assumed that optimising the XmlReader for the side-case of someone loading it on a microchip who really cares about gen0 collections is optimising for the wrong case. I'll put money on the writer of the XmlReader class wanting to write code that is correct, easy to use, easy to read and easy to fix when the XML spec changes in future, rather than wanting to optimise away all of the almost-free allocs in the code.

In fact, I challenge Joe Duffy to find an implementation of an XmlReader on a native platform (like C/C++) that is complete with regards to the XML spec and contains no short-lived allocations.

It doesn't have to be this way. String does, after all, have an indexer. And it's type-safe! So in-place parsing at least won't lead to buffer overruns. Sadly, I have concluded that few people, at least in the context of .NET, will write efficient string parsing code.

Which is good. People writing "efficient" string parsing code in C++ often get it wrong and write hard-to-read code. C# is an "algorithm-orientated" rather than a "performance-orientated" language.

In fact, having non-mutable strings gives huge benefits to C# compared with C++. It enables a ton of optimisations such as aggressive inlining of functions and makes parallelism and atomicity easier to achieve. It also means that if you have a string as a private member, you can hand it back to a caller by reference safe in the knowledge that they can't mash the content of it, leading to state corruption or even security holes. In C++ you can only get this kind of guarantee by copying the string, which means an expensive and fragmentary malloc/new followed by exactly the thing that Joe despises - a full string copy into the newly allocated buffer.

The whole platform is written to assume that strings are available, and does not have an efficient representation of a transient substring

In does, in fact, have an efficient representation of a transient substring; System.StringBuilder is exactly the class that Joe Duffy seems to love so much. It allows inplace modification of strings and uses a single underlying array of chars with a length field.

Truncating a System.StringBuilder requires no internal memory copies or allocates, so Joe will be very happy to know that he can continue to use .NET without all of those gen0 collections ruining his day.

And of course the APIs have been designed to coax you into making copy after copy, rather than doing efficient text manipulation in place. Hell, even the HTTP and ASP.NET web stacks are rife with such inefficiencies.

And yet they outperform most of their native competitors. Go figure.

I suppose it's possible to ignore all of this and let the GC chew up 30% or more of your program's execution time without anybody noticing. I'm baffled that such software is written, but at the same time I realize that my expectations are out of whack with respect to common practice.

I think if Joe Duffy's code is spending 30% of its time in the GC, then he should probably stop calling GC.Collect() in a while(true) loop.

Don't get me wrong, it's possible to write inefficient code in C#. But programs are usually tanked not by the garbage collector, but by dreadful algorithms. If joe spent half as much time converting his bubble sorts into quick sorts and arranging his code to efficiently store and compute results, he'd probably find that the GC really is the least of his problems.

Complaining about the performance of the .NET GC usually comes from a misunderstanding about what C# is compared with C++, how the GC works, and fundamentally the difference between high performant versus low performant code.

If you think your code is slow, my advice is to benchmark it. Usually your CPU isn't spending very much time in the GC at all, but spending all of it's time waiting on locks or hotlooping through shoddy algorithms, and that code would be slow whether you write it in C# or Haskell or C++.

gen0 collections might not be free, but they are cheap. Writing shoddy code to avoid them is likely to make your code worse, hide real performance bottlenecks and lead to subtle sometimes catastrophic problems with your code.

So anyway, I call shenanigans.