Coffeehouse Thread

24 posts

Joe Duffy : a "managed" system ... beat the pants off all the popular native programming environments

Back to Forum: Coffeehouse
  • LiquidBoy

    We know that his working on a managed code base, that's several millions of lines long of managed code ... that can beat the pants off any native programming environment ...

    SHOW ME THIS "MANAGED" NIRVANA Smiley

    http://www.bluebytesoftware.com/blog/2012/10/31/BewareTheString.aspx

     

  • JohnAskew

    from: http://www.bluebytesoftware.com/blog/2012/10/31/BewareTheString.aspx

    I've been working in an environment where performance is critical, and everything is managed code, for several years now.  That might sound like an oxymoron, but our system can in fact beat the pants off all the popular native programming environments.  The key to success?  Thought and discipline.

    ...

    The moral of the story?  Love your single string type.  It's a wonderful thing.  But always remember: An allocation is an allocation; make sure you can afford it.  Gen0 collections aren't free, and software written to assume they are is easily detectible.  String.Split allocates an array and a substring for each element within; there's almost always a better way.

    It looks like Joe Duffy is "the architecture of an experimental OS's developer platform, where he is also chief architect of its programming language." So that means he's talking about his own managed nirvana ~ which means you have to get your answers from him, correct?

    The bit about string parsing and avoiding allocation is common sense put into elegant code to great effect, nice.

  • spivonious

    , LiquidBoy wrote

    We know that his working on a managed code base, that's several millions of lines long of managed code ... that can beat the pants off any native programming environment ...

    SHOW ME THIS "MANAGED" NIRVANA Smiley

    http://www.bluebytesoftware.com/blog/2012/10/31/BewareTheString.aspx

     

    I ran his examples and going by index is a heck of a lot slower than using the Split function, at least for one run of each.

    First 0.0007529

    Second 0.0024157

  • evildictait​or

    , spivonious wrote

    *snip*

    I ran his examples and going by index is a heck of a lot slower than using the Split function, at least for one run of each.

    First 0.0007529

    Second 0.0024157

    * How many runs are you doing of each.

    * What resolution timer are you using.

    * How are you ensuring that you aren't being biased by .NET startup costs or static constructors getting in the way?

     

  • spivonious

    @evildictaitor: Just one run, and a string with only 5 delimited members. Using .NET Stopwatch. Running both tests consecutively in a console app.

    I did change it run each 1000 times and the Split method did take twice as long as the index method. Still, "premature optimization..."

  • evildictait​or

    Also I call shenanigans:

    If this is, say, parsing HTTP headers on a heavily loaded server, you bet it's going to make a noticeable difference.

    Not unless your GC sucks a55 it won't. I've ran servers on microchips that eat strings for breakfast. If he's struggling with strings causing too many collections his code is already so far past wrong that it's unreal. And servers are always a bad example for performance junkies to talk about. Network latency always dwarfs the cost of the GC, and are a great thing for attackers to attack and suffer really badly from memory fragmentation. So they are in fact ideal candidates to be made into managed code with a full blown GC behind them.

    In fact, the .NET GC is specifically designed to cope with large numbers of collections of dead small objects. And whilst he says gen-zero collections aren't free, he's missing the point that the cost is proportional to the number of live objects, not the number of dead ones, and so they're a whole lot freer than he thinks they are.

    In fact, GC's gcalloc is faster than malloc, calloc and new in C++, and in the case where they don't trigger a full collect, are only marginally more expensive than a stack allocate, mainly due to the fact that it'll call a constructor and clear the contents of your memory

    Crank up .NET's XmlReader and profile loading a modest XML document. You'll be surprised to see that allocations during parsing add up to approximately 4X the document's size. Many of these are strings. How did we end up in such a place? Presumably because whoever wrote these abstractions fell trap to the fallacy that "gen0 collections are free." But also because layers upon layers of such things lie beneath.

    I think, actually, that Joe has fallen into the fallacy that XML is a suitable solution for a high performance project. If you're storing your data as XML, you better suck up the fact that your data is in a human readable rather than a machine efficient storage format and that getting data out of it probably shouldn't be in a hot loop (and if it's not in a hot loop, why do you care about its performance?)

    The writer of the XmlReader probably realized this, and therefore (correctly) assumed that optimising the XmlReader for the side-case of someone loading it on a microchip who really cares about gen0 collections is optimising for the wrong case. I'll put money on the writer of the XmlReader class wanting to write code that is correct, easy to use, easy to read and easy to fix when the XML spec changes in future, rather than wanting to optimise away all of the almost-free allocs in the code.

    In fact, I challenge Joe Duffy to find an implementation of an XmlReader on a native platform (like C/C++) that is complete with regards to the XML spec and contains no short-lived allocations.

    It doesn't have to be this way. String does, after all, have an indexer. And it's type-safe! So in-place parsing at least won't lead to buffer overruns. Sadly, I have concluded that few people, at least in the context of .NET, will write efficient string parsing code.

    Which is good. People writing "efficient" string parsing code in C++ often get it wrong and write hard-to-read code. C# is an "algorithm-orientated" rather than a "performance-orientated" language.

    In fact, having non-mutable strings gives huge benefits to C# compared with C++. It enables a ton of optimisations such as aggressive inlining of functions and makes parallelism and atomicity easier to achieve. It also means that if you have a string as a private member, you can hand it back to a caller by reference safe in the knowledge that they can't mash the content of it, leading to state corruption or even security holes. In C++ you can only get this kind of guarantee by copying the string, which means an expensive and fragmentary malloc/new followed by exactly the thing that Joe despises - a full string copy into the newly allocated buffer.

    The whole platform is written to assume that strings are available, and does not have an efficient representation of a transient substring

    In does, in fact, have an efficient representation of a transient substring; System.StringBuilder is exactly the class that Joe Duffy seems to love so much. It allows inplace modification of strings and uses a single underlying array of chars with a length field.

    Truncating a System.StringBuilder requires no internal memory copies or allocates, so Joe will be very happy to know that he can continue to use .NET without all of those gen0 collections ruining his day.

    And of course the APIs have been designed to coax you into making copy after copy, rather than doing efficient text manipulation in place. Hell, even the HTTP and ASP.NET web stacks are rife with such inefficiencies.

    And yet they outperform most of their native competitors. Go figure.

    I suppose it's possible to ignore all of this and let the GC chew up 30% or more of your program's execution time without anybody noticing. I'm baffled that such software is written, but at the same time I realize that my expectations are out of whack with respect to common practice.

    I think if Joe Duffy's code is spending 30% of its time in the GC, then he should probably stop calling GC.Collect() in a while(true) loop.

    Don't get me wrong, it's possible to write inefficient code in C#. But programs are usually tanked not by the garbage collector, but by dreadful algorithms. If joe spent half as much time converting his bubble sorts into quick sorts and arranging his code to efficiently store and compute results, he'd probably find that the GC really is the least of his problems.

    Complaining about the performance of the .NET GC usually comes from a misunderstanding about what C# is compared with C++, how the GC works, and fundamentally the difference between high performant versus low performant code.

    If you think your code is slow, my advice is to benchmark it. Usually your CPU isn't spending very much time in the GC at all, but spending all of it's time waiting on locks or hotlooping through shoddy algorithms, and that code would be slow whether you write it in C# or Haskell or C++.

    gen0 collections might not be free, but they are cheap. Writing shoddy code to avoid them is likely to make your code worse, hide real performance bottlenecks and lead to subtle sometimes catastrophic problems with your code.

    So anyway, I call shenanigans.

  • Blue Ink

    , spivonious wrote

    @evildictaitor: Just one run, and a string with only 5 delimited members. Using .NET Stopwatch. Running both tests consecutively in a console app.

    I did change it run each 1000 times and the Split method did take twice as long as the index method. Still, "premature optimization..."

    I believe the point of the article was to show how inefficient parsing methods can trigger frequent collections that eventually add up. If you want to see that in action, both 1 and 1000 are off by several orders of magnitude.

  • magicalclick

    @LiquidBoy:

    I don't see code one to be that bad. Sure it takes more memory, but it is done only one time. There is one extra indirection though.

    problem is, it gets worse if you repeat split down the road.

    it is not something to do with managed vs unmanaged. You get the same problem with any language. Anyway, a side note, it should get an array of indexs on code two. You can get shallow call stack and multithreading for doing that.

    Leaving WM on 5/2018 if no apps, no dedicated billboards where I drive, no Store name.
    Last modified
  • Blue Ink
  • TexasToast

    @evildictaitor: +2    I like your explanation and theories much better.  You know your stuff better than Duffy.

  • LiquidBoy

    Definetely sounds like evildicator knows his stuff, not questioning that. BUT I do agree with Joe Duffy as well with the overuse of split normally for massive dynamic strings and almost always with a collection of strings.. Lots of .NET devs, including myself, are very poor optimized programmers as Joe hints at..

    Regardless I also know that Joe and his team is probably working on Midori and the SingularityOS project, that several millions of lines of managed code on a managed os.

    Anyway I found some stats of Singularity, of how it improves across the board on certain OS system cycles ... Some of you may find it interesting :

     

    http://channel9.msdn.com/Forums/Coffeehouse/The-Singularity-Project-with-results

  • Sven Groot

    His 30% GC time does sound excessively high. I've done experiments with incredibly string-processing heavy algorithms (lots of string.Split usage, among other things), and that spent about 10% of its runtime in the GC.

    However, that was on Mono, before the introduction of the sgen GC (their old GC is not generational and will stop the world for every collection). So I find it hard to believe that .Net's much better optimized generational GC would ever use 30% of the execution time, unless your code is monumentally bad.

  • magicalclick

    @LiquidBoy:

     

    btw, code 2 won't compile. There is no substr defined. I can't be sure he didn't allocate extra substr when it is not defined nor assigned in his code.

    problem with code one is mostly bloated memory usage when the same style is used down the deep call stack.

    Leaving WM on 5/2018 if no apps, no dedicated billboards where I drive, no Store name.
    Last modified
  • Richard.Hein

     

    "Honestly, I've witnessed programs that should be I/O bound turn into programs that are compute-bound, simply due to use of inefficient string parsing routines across enormous amounts of data(Okay, the developers also did other sloppy allocation-heavy things, but string certainly contributed.) "

     

    I certainly believe it.  He said 30% in an off-hand way; that is possible, imho, under the conditions described, particularly those in bold, quoted above.  I work on a high-performance server, and not that long ago, there was a cache related bug causing the cache refresh worker to constantly run, never finishing and locking the cache during recalculation of the hashes.  The GC was running forever, because of a circular dependency between two handlers creating a massive amount of strings - XSLT/XML stuff.

     

  • cheong

    , evildictait​or wrote

    I think, actually, that Joe has fallen into the fallacy that XML is a suitable solution for a high performance project. If you're storing your data as XML, you better suck up the fact that your data is in a human readable rather than a machine efficient storage format and that getting data out of it probably shouldn't be in a hot loop (and if it's not in a hot loop, why do you care about its performance?)

    The writer of the XmlReader probably realized this, and therefore (correctly) assumed that optimising the XmlReader for the side-case of someone loading it on a microchip who really cares about gen0 collections is optimising for the wrong case. I'll put money on the writer of the XmlReader class wanting to write code that is correct, easy to use, easy to read and easy to fix when the XML spec changes in future, rather than wanting to optimise away all of the almost-free allocs in the code.

    In fact, I challenge Joe Duffy to find an implementation of an XmlReader on a native platform (like C/C++) that is complete with regards to the XML spec and contains no short-lived allocations.

    Indeed. If you want to write your own XMLReader, you have to take care of the parsing difficulty introduced by flexibility of XML format. Or you'd be just introducing a fixed text-based format that looks like XML.

    And because .NET string are reference based pooled strings, unlike unmanaged C++ where one string means one buffer. I think the actually allocation occured may not be as bad as he predicts. Of course, we need a test to confirm this.

    Recent Achievement unlocked: Code Avenger Tier 4/6: You see dead program. A lot!
    Last modified
  • evildictait​or

    , Richard.Hein wrote 

    I work on a high-performance server ... XSLT/XML stuff.

    I think I found your problem.

  • Dr Herbie

    @evildictaitor: Out of interest, what do you recommend instead of XML in high perf systems : binary serialisation? Json?

    Herbie

  • Ion Todirel

    I love my string type, it's called std::wstring

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.