Coffeehouse Thread

24 posts

Joe Duffy : a "managed" system ... beat the pants off all the popular native programming environments

Back to Forum: Coffeehouse
  • evildictait​or

    , Dr Herbie wrote

    @evildictaitor: Out of interest, what do you recommend instead of XML in high perf systems : binary serialisation? Json?

    Herbie

    It depends what you mean by a high performance system, and what you're storing in the XML.

    But basically, if you're using your XML to store stuff like configuration options, I'd say keep the XML but parse the file during the startup of your server (or during a config flush) and keep the pertinent data in memory. That way you avoid doing XML operations during the hot loop of your high-perf server.

    If your XML is not so much about configs, but more about data storage, something more like SQL server is probably going to be a good choice. Just remember to parameterize all of your statements and put the parameter strings as constants in the SQL database as stored functions - not only will that protect you from SQL injections, it'll also mean that your queries get precompiled in SQL and that you'll get a massive reduction in the transmission cost between your app and the SQL server, be it an in-proc SQL server, a SQL server on the same machine via a pipe or even one on a different machine.

    Also it depends on if your high performance server, is in fact, an IO-bound server rather than a CPU bound one.

    For example, if you're using a GPU farm to compute results for a massive simulation like a weather modelling or modelling explosions (a high performance server) then doing XML ops during a hot loop is going to completely destroy your performance. But if you're actually dealing mainly with perhaps 1000 HTTP requests an hour on a webserver, then really you're not a high performance server at all, but just a regular one, and doing a whole ton of XML per page probably isn't going to hurt you.

    If you're really asking about whether XML versus JSON is a good output of the server, (e.g. webservices / SOAP / REST server) then the thing to bear in mind is that the cost of doing XML memory operations is nothing compared with the cost of pumping those bytes around the world over the Interwebs. JSON will give you a speed boost not because it has fewer string operations or because it is easier to parse, but simply because it is smaller and will take fewer packets to send it from your server to a browser or vice-versa.

  • JohnAskew

    @evildictaitor: You did not say that binary serialization is preferred for 'a massive simulation like weather modelling or modelling explosions'. Why not?

  • evildictait​or

    , JohnAskew wrote

    @evildictaitor: You did not say that binary serialization is preferred for 'a massive simulation like weather modelling or modelling explosions'. Why not?

    Because for hot loops of massive simulations, serialization of any kind should be avoided. If you really have a requirement like a massive simulation, you need to think really carefully about what data it is that you need to send and then think carefully about how to send that set sensibly to the other side.

    For example, if you're a games company shoving 60 frames a second over the wire, then XML is a really bad choice for getting those pixels over. And if you're running a cluster for climate modelling, frankly just keeping all of the intermediate data in memory rather than writing it to disk at all will probably lower the amount of CO2 you're pumping into your climate.

    If you're writing a lot of persistent data out (like data in a HFT trading server or people's XY coordinates in a game server) then a SQL database is probably the best way to go. SQL databases have the benefit of being simulteniously a binary serialized form of the data as well as being pretty optimised for fast transactions. It's also nice that unlike customized binary marshallers, it's usually pretty easy to just dive in to a SQL database to add, remove and query the data, hence avoiding a lot of the nastyness that comes with custom binary formats.

    Binary serialization also means different things to different people as well, which is why I tried to avoid mentioning it. The .NET "binary serializer" is has performance frankly not much better than XML for reading/writing, but direct marshalling of data via C structures is outrageously efficient, even if it does destroy your chances of ever being able to modify that data manually or manage binary compatibility with previous versions of your own software.

    Some binary formats are also really nasty and inefficient to parse; ASN1 is perhaps one of the greatest examples of an industry-standard binary serialization that is large, cumbersome, hard to parse, hard to query and inefficient to manipulate. Frankly XML beats the pants off ASN1 most days of the week.

    But in short - XML is great, but just not for parsing in a hot loop of a high performance application like a game's render loop or a climate modelling cluster. When you're in a hot loop there isn't a one-size fits all solution. You need to engage your brain in order to squeeze extra performance out, and think about what you actually need to do in your hot loop, and what could be precomputed, offloaded or computed more efficiently to speed up your hot loop.

  • Richard.Hein

    , evildictait​or wrote

    *snip*

    I think I found your problem.

    LOL, but take it in context ... actually mentioning XSLT/XML probably confuses the issue - it's not really about XML per se.  Users have the option of using JSON as well, and in that case the performance would have degraded slower, but would have still degraded eventually.  Essentially, there is a pipeline, configured via web.config, which determines the components that should process the data.  There was a loop created because the users had configured the first component to process the data, pass that to the 2nd, which passes it to the 1st ... etc....  So, bad coding/configuration was the real culprit, but it also highlighted just how many strings were being created during these processing stages.

  • evildictait​or

    , Richard.Hein wrote

    *snip*

    LOL, but take it in context ... actually mentioning XSLT/XML probably confuses the issue - it's not really about XML per se.  Users have the option of using JSON as well, and in that case the performance would have degraded slower, but would have still degraded eventually.  Essentially, there is a pipeline, configured via web.config, which determines the components that should process the data.  There was a loop created because the users had configured the first component to process the data, pass that to the 2nd, which passes it to the 1st ... etc....  So, bad coding/configuration was the real culprit, but it also highlighted just how many strings were being created during these processing stages.

    Not wanting to be snobbish about it, but an ASP.NET server serving webpages is not a high performance server.

    The main speed increase you'll get by switching to JSON from XML has nothing to do with the parsing complexity of the XmlReader or the gen0 collections; but rather will be entirely down to the fact that JSON is smaller on the wire and sending a few extra TCP packets round the globe is going to dwarf any of those minor CPU costs. Hell, a single undelivered TCP packet costs upwards of 40ms to NACK and respond, which is fast enough for my computer to allocate about 33874715 string allocations on my machine, and my machine can do roughly 2800 GC.Collect(3)s in that amount of time too, or 18000 GC.Collect(1)s in that amount of time.

    Yes, the GC and string allocates are not free. But on web-servers, the different is negligable. It's like we're arguing about whether the reason the bridge fell down is because of atom vibrations in sodium atoms in the pillars.

    The vibrations are there, but that's not why the bridge fell down. It's more likely that you just built the bridge wrong than that the sodium atoms just went all crazy on your bridge.

    These discussions always feel like developers going out of their way to blame someone other than themselves for their shoddy code. If they just fix the code (or for heaven's sake, just benchmark it) they'll find out that 99.999999999999999% of the time, the reason their code is slow is because it's slow code that they've written.

  • exoteric

    , evildictait​or wrote

    Also I call shenanigans:

    *snip*

    I think, actually, that Joe has fallen into the fallacy that XML is a suitable solution for a high performance project. 

    Simple observation ... maybe Joe just cares about performance of library software. As a library writer his decisions impacts a lot of other software, whether written with performance in mind or not. How can you not like that mentality insofar as it does not deteriorate correctness.

    As for the further development of the XML syntax some people think it is already unnecessarily complicated. The W3C normally creates new syntax "around" XML, like with XML NS, XLink, etc. I'm looking more forward to broad EXI support; not that EXI looks simple but it has some desirable characteristics besides abolishing the human-readability aspect to XML; in the mean-time the "zipped XML" design pattern is mainstream.

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.