, JohnAskew wrote

@evildictaitor: You did not say that binary serialization is preferred for 'a massive simulation like weather modelling or modelling explosions'. Why not?

Because for hot loops of massive simulations, serialization of any kind should be avoided. If you really have a requirement like a massive simulation, you need to think really carefully about what data it is that you need to send and then think carefully about how to send that set sensibly to the other side.

For example, if you're a games company shoving 60 frames a second over the wire, then XML is a really bad choice for getting those pixels over. And if you're running a cluster for climate modelling, frankly just keeping all of the intermediate data in memory rather than writing it to disk at all will probably lower the amount of CO2 you're pumping into your climate.

If you're writing a lot of persistent data out (like data in a HFT trading server or people's XY coordinates in a game server) then a SQL database is probably the best way to go. SQL databases have the benefit of being simulteniously a binary serialized form of the data as well as being pretty optimised for fast transactions. It's also nice that unlike customized binary marshallers, it's usually pretty easy to just dive in to a SQL database to add, remove and query the data, hence avoiding a lot of the nastyness that comes with custom binary formats.

Binary serialization also means different things to different people as well, which is why I tried to avoid mentioning it. The .NET "binary serializer" is has performance frankly not much better than XML for reading/writing, but direct marshalling of data via C structures is outrageously efficient, even if it does destroy your chances of ever being able to modify that data manually or manage binary compatibility with previous versions of your own software.

Some binary formats are also really nasty and inefficient to parse; ASN1 is perhaps one of the greatest examples of an industry-standard binary serialization that is large, cumbersome, hard to parse, hard to query and inefficient to manipulate. Frankly XML beats the pants off ASN1 most days of the week.

But in short - XML is great, but just not for parsing in a hot loop of a high performance application like a game's render loop or a climate modelling cluster. When you're in a hot loop there isn't a one-size fits all solution. You need to engage your brain in order to squeeze extra performance out, and think about what you actually need to do in your hot loop, and what could be precomputed, offloaded or computed more efficiently to speed up your hot loop.