Coffeehouse Thread

5 posts

Forum Read Only

This forum has been made read only by the site admins. No new threads or comments can be added.

Mistakes to avoid

Back to Forum: Coffeehouse
  • User profile image
    vesuvius

    If you had to write an application that created it's own proprietery format like Microsoft Word or Excel where future compatability was key i.e. a file created in 2010 willl still be able to be read in 2020 even though a lot of features have been added to a bit of software, how would you go about it?

    I have seen a few applications that are not database dependant, that have serious versioning issues where something that worked for customer X cannot work for customer Y because some class names were changed for example, and now you have to recreate their thousands of documents in order for them to use the new (improved) version of the software.

    I guess this boils down to versioning, what they the things to watch out for to ensure that things don't break?

  • User profile image
    exoteric

    @vesuvius: I guess this happens when you use things such as XAML that create a direct binding between CLR types and XML data?

    The simplest form of versioning I know of in XML is just using namespaces (urn:vesuvius:format1.3) or a version attribute on the root element which should allow future software to load in the data as pure XML, do an in-memory transformation (XLinq) and then parse that data again as XAML, now compatible with the evolved data model.

    That said, I don't know what Microsoft's story is for versioning XAML and handling data-model evolution.

  • User profile image
    ManipUni

    I'd start with two simple ideas and work from there:
     - XML Based
     - Small subsets

    Meaning you can take part of the file and ignore most of the additional features built on top. Attributes naturally make for a nice connector between two distinct sets. So as a very bad example:

    <Document>
    <Text>The Quick <Feature Color=Red>Red</Feature> Fox <Feature Style=Underline>Jumps</Feature> Over the Lazy <Feature Color=Brown>Brown</Feature> Dog</Text>
    </Document>

    Now you can parse <Text></Text> and ignore <Feature></Feature> tags entirely, or only parse attributes that are known by your parser. As a more convoluted example:
    <Document>
    <Text>The Quick Red Fox Jumps Over the <Feature Foo=Bar>Lazy</Feature> Brown Dog</Text>
    <Features>
    <Foo>
      <Bar URL="www.google" MagicBeans=True Security=False Settings=12345 />
    </Foo>
    </Features>
    </Document>

    Now we've added a new feature, and defined properties to it. But you still only need to really understand how to parse <Text></Text> and can ignore <Feature></Feature>. This kind of style means that people need to write a parser to do the minimal amount required to read the broad format - all of the addons are entirely optional. The format's guidelines can be less than one page.

    PS - One big downside of the above is that it is really difficult to read without "stepping" through the XML file (i.e. using DOM results in "ugly" code).

  • User profile image
    ManipUni

    I know the above doesn't really answer your question about versioning. I am giving an alternative answer namely that you can use subsets and supersets to accomplish the same thing as a "version" without creating tons of back and forward compatibility issues.

  • User profile image
    Blue Ink

    @vesuvius: I faced the problem several times, both as a producer and as a consumer. This is what I think I learned so far (usually the hard way):

    0) The diamond rule: data are forever. Your application may not survive v 2.0, but eventually (even a decade later) someone will have to port your data to some new app. Having a clean, consistent and well documented format will save you from some embarassment.

    1) Always include a version header. Sooner or later you have to update the format, and having to sniff the version just sucks.

    2) Avoid creating hard dependencies between your file format and your internal representation of the document.

    3) Avoid any sort of binary format that creates a dependency to the platform you are using. Endianness happens. Among other beasts.

    4) Avoid as much as possible creating dependencies to external proprietary formats you don't have your own codec for. That nice component may be free and widely available now... (Wang Imaging, anyone?)

    5) Being able to load files generated by older versions of your program is a must, but the opposite is most definitely not. Yes, you can get this kind of compatibility through some clever hack, but your format will be messed up beyond belief (hello PDF, feeling the love yet?)

    This pretty much sums it up. I would only stress that documents, and their format, are not just a byproduct of some code: they are the whole point of that code. This is why feature requests that involve a breaking change should start with -1000 points. It's not possible to turn them all down, but it's a good cause to fight for.

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.