Entries:
Comments:
Posts:

Loading User Information from Channel 9

Something went wrong getting user information from Channel 9

Latest Achievement:

Loading User Information from MSDN

Something went wrong getting user information from MSDN

Visual Studio Achievements

Latest Achievement:

Loading Visual Studio Achievements

Something went wrong getting the Visual Studio Achievements

Data/Contract Coupling in Messaging

Download

Right click “Save as…”

This whiteboard talk was largely inspired by a series of Twitter conversations around whether and when it's a good idea to share types (i.e. .NET classes compiled into assemblies) as a way to express contracts in a messaging system.

I don't think it is. In fact, it's the exact opposite of loose coupling and negates a lot of the advantages of opting for a messaging middleware system and mostly all advantages of opting into open standard protocols like HTTP and AMQP.

That said, it's difficult to blame the folks in the .NET developer community who've arrived at that practice, ultimately having been led down that path by -- as I am meanwhile convinced -- the late 1990s industry choice of picking XML Schema (XSD) to describe the content of messages. XML Schema, with all of its complexity, is a reasonably good way to describe the syntax of XML-based markup languages of which there are many and where aspects like element order and attribute substitution many other complexities of XSDs might make some sense. XSD's type system, which allows for restrictions that fairly directly map to inheritance on OO languages led developers to think of messages as objects - and serialization frameworks that allow for mapping the XSD type model into class hierarchy helped with that impression.

Messages are not objects. Messages don't care about the version control history. Messages don't care about whether a subset of the data they carry also similarly appears in some other message.

Messages are flat. Maps of keys to values. The values are simple-typed or, again, maps or arrays or values or maps. JSON embodies that model. If you ignore XSD and allow for the same simple constraints as JSON, XML also embodies that model.

If you take a look, you'll start noticing that we and a lot of others in the industry have moved to document messages in simpler ways than schemas.

(BTW, if we could easily do it without breaking backwards compatibility we'd even drop the namespace in the SB API snippet above)

You'll generally see a plain text documentation of properties of which some may have complex context consisting of maps and arrays. The advantage is that these descriptions map nicely into all programming languages and platforms by ways of a skilled developer.

I realize this is controversial, so I'm looking forward to comments.

Tags:

Follow the Discussion

  • Vaughn VernonVaughn Vernon

    Well done, Clemens. I was involved in at least one of the related Twitter discussions that you mentioned. I blogged my viewpoint of this topic here https://vaughnvernon.co/?p=222 and it seems that I am fairly closely aligned with what you are promoting. In fact Chapter 13 of my book "Implementing Domain-Driven Design" https://vaughnvernon.co/?page_id=168 and here on Safari Books Online http://my.safaribooksonline.com/book/project-management/9780133039900 suggests using a "custom media type" approach to documenting message data contracts, which seems to be at least close, if not the same, as you are suggesting. I agree that it allows developers in disparate systems to consume messages in a way that is best for them.

    I want to be clear that when I discuss the use of XML in my blog post I am not suggesting the definition of strict schema, but simple, well-formed documents. That said, I also think that JSON is a better data format, and is what I discuss at length and use in Chapter 13.

    I will update my blog post with a link back to this post. Thanks much for taking the time to share and express your views.

    Vaughn

  • I don't really think you can generalize on this.  For simple interactions like the examples given, schema and a shared type is overkill.  But in a large complex transactional system, with 100 or 1000s of fields it simply doesn't make sense to avoid schema.

    If there's no machine readable schema then each developer has to write their own serializer and deserialize code.  A process reading a complex message of 400 or more fields in a structured document would have verify the presence of each field before they use it.  This can lead to monotamaus, buggy code.

    It's far easier to automate the task of verifying the message against one of the valid schemas.  Providing for basic for rules for non-break minor changes is a must.  The validation code can be generated - and once you have a valid message you can deserialize it so that you can programmatically access the contents.

    Resorting to manual message validation and serialization as a way of solving the issue of tight coupling is really solving the sympton rather than the cause.  The real issue is to have a versioning strategy with a flexible automated code generation system that is designed for loosely coupled providers and consumers.

    A good practical example of this is Google Protocol Buffers extension mechanism.  With ProtoBufs you can partially deserialize from a code generated class and load an extension schema in dynamically at runtime to access additional data.  Much like C#'s hybrid static and dynamic typing.

  • Thank you, Joey. Hundreds or thousands of fields are commonly some dozen sections of some dozen groups of some dozen fields. Looking at EDIFACT or X12 dictionaries I see composable groups of largely separate but related concerns. An X12 document can quite well be seen as a session or exchange of a set of small messages. Also, every field you send there matters, whether you send 1000 or 10, so I don't see how structural complexity makes a loosely coupled approach that anticipates change less valid. It is arguably less convenient to deal with.

    Validation matters. But I am meanwhile convinced that the validation engine needs to know what it is doing and it needs to have a notion of the semantics. And it surely must not stand in the way of change. Validation along the lines of a markup-language-formalization-language that has no notion of context has largely proven  to be harmful. Example: We had a schema validation code path that an eager developer put in slip by in code review and had to completely redesign the feature since we couldn't provide backwards compatibility with the schema-validating client we shipped in the previous version. It's not an isolated case.

     

     

     

  • ManchildManchild

    It's quite sad to see yet another re-# of the whole "typed vs. untyped" argument being had on yet another level.
    If all the pro's that Clemens mentioned was as productive or worth while in general, we should all be programming in Dynamic- either Functional or Script-style GPL's, but that's not the case at all.
    The abstraction of 'Objects' still seems to work out quite well for imperitive GPLs like C#, Java, C++, VB ect.
    The developer productivity gains that directly result from the use/re-use and propagation of type-information is an undeniable FACT!
    If a little bit of Memory(which is has never been a problem anyway!)i.t.o POCO classes and taking a grown-up look @ my service-versioning story is the price I have to pay for said gains, I'm fine with that!
    I'd rather take the hit on hardware, where I cope with it transparenty, than take a crippling hit i.t.o the complexity at design/re-use time that would negatively affect composition and component assembly.

    XSD was conceptualized with OO in mind, that's why there is such a strong parity between the 2. It's not as the result of some 'weird misconception' on the part of the developer community that we use it the way we do, it's the way it was designed to be used.

    I'm currenly working on an Azure PaaS-based SOA System implementation and tool-suite that fuses together 15 different WS-* specs into a GML 3.2.1 based dialect.XSD, WSDL, WS-BPEL 2.0 and WS-CDL are used to design and implement the Service Logic so I as far my users are concerned the whole issue goes away during their design-time tooling experience as a result of that beautiful language XSD that you mock so readily, but in order to enable that in the Middleware style C# code I need some help on a Framework level.
    I mean honestly, good luck doing some serious System-Level design and programming like this without your type-info at hand!
    (of-course I could do evey query by hand using LINQ-to-XML but the resulting code bloat and managiblity issues resulting from these millions of 1-off queries would be a project killer)

    Failure on Microsoft's part to commit to a single Serialization API for typed XML as part of the .Net Framework is the real problem.
    And the sad truth is that Microsoft had the solution, but disgarded it.
    Despite 'a lot' of exitement expressed by every developer that came in contact with it during the project's Beta stages.

    The idea was started by your very own in-house guru Erik Meijer, in his original paper on the topic was called 'Programming with Triangles' but was later realized into a project called LINQ-to-XSD(http://linqtoxsd.codeplex.com/) and it's the perfect solution.
    But since the XML-team @ Microsoft hasn't replied to a single e-mail I've sent them in this regard in the past 3 years, I thought I'd vent some of my dismay here. Maybe someone that reads this knows why the project was never realized into an actual product despite raving reviews by us your developer community...

  • Henrik FeldtHenrik Feldt

    What do you consider data? If your messages ARE your data you need a way to know their schema when reading them and detect breaking changes from reaching consumers.

    Assume you know that messages are valid as they are sent. Then validation does not matter when you receive, if you can find the schema of the message type.

    Therefore, introducing required properties require a new message type for message types that are in production. No more <any /> tags.

    The any tag gives a hint though: you can subtype existing message types to perform extensions but you can't subtype to delete properties.

    In our MassTransit infra we version messages with assemblies in packages but once a message type is put in prod, it is non conflicting changes only.

    Apache Avro and Kafka send schema with data: a very nice choice when you move outside of the CLR comfort zone. They do chacking of conflicting changes in continuous integration.

  • @ClemensVasters: Right, but my point is you have to have a strategy that deals with change. 

    The best way to think about it is that the schema is a DSL which can be used to automate code functions like validation, serialization, GUI display, reports etc.. That DSL needs to be able to cope with minor and major changes, and expressive enough to deal with complex conditional optional structures.

    This approach shouldn't be applied everywhere (if the size of the dataset is trivial).  And, if the DSL isn't rich enough or the approach has failed before the solution is to fix the DSL rather than just resort to a non-machine readable schema document.  If you go typeless you've removed the ability to automate a large part of your system.

     

  • ChevalN2Cheval Why not null?

    Personally I really don't like having a data contract extendable. A contract is a contract, so the data must conform.

    From our way of thinking, we want to be tightly coupled to the data contract, but having expendable end points for older data contract versions.

    Message in Json format

           i.   Date Created

           ii.  Source ID

           iii. Data Object Type Name

           iv.  Data Object Type Version

           v.   Package data

               a. Any Json data but schema valid the the type version

    So basically:

    1)      "A" creates data object which is serialised to a json message (wrapper and data).

    2)      Json messge is added to a queue.

    3)      B Serialiser dequeues messages that it has an understanding about the type and version number.

    4)      B Rehydrates and schema verifies data object from message package data.

    Available - you can run B version 7 next to B version 6 until there are no more messages for version 6 in the queue.

    Scalability - as you can spin up new B's in different versions to process messages.

    Performance - as B only tries to dequeue, deserialse and validate packages that are for them.

    Timeliness – You can check many queues and even specialize queues for data types if you want.

    Data – basic json strings.

    Other advantages are:

    Monitoring – you can peek at the queue to see the progress of an upgrade.

    Deployment – you can monitor the queue to see which A's which have not been upgraded.

    etc.

  • MikeYeaneymyeaney Lovin the 9...

    For me, this whiteboard session was summed up very well at the end (sorry - don't have a time-mark off the top of my head): If we're building an distributed set of services that require implicit "synchronization", that is, contracts must be in sync, type knowledge must be available to everyone, etc., than what we've actually built is ONE BUSINESS SERVICE connected by distributed technologies.

    By contrast, as soon as you try to connect MULTIPLE BUSINESS SERVICES, developed by different teams/companies, different timezones/locations, different languages, toolsets (make sure you include vim/emacs here even if you don't agree with it), all of these "benefits" (of "pixie dust" as Clemens puts it) begin to vanish very quickly. You simply cannot force that type of orchestration across companies/teams/locations/timezones without paying a heavy price. I've worked at at least two companies that have tried, and they end up getting buried by their own processes.

    IMO, either of these models is acceptable - so long as we're cognizant of the implications of the systems we're building. That (from my point of view) was the real point of this session....not a religious debate over some ultimate right-and-wrong (or "best" practice, whatever that means).

    As always, use what works for you and your specific application - but understand you can't have both. Much like the "static v. dynamic" debates, if you want a language that allows dynamic features, you're going to have a bad day if you try to do that with a static language. Likewise, if you want complete decoupling between services (no really - 100% decoupling, not just a separate project in Visual Studio), you're going to have to build them differently (regardless of technology you're using).

  • I do think that Joey got it right. Use your static type information whenever possible and put your "unexpected" data into a sort of Any tag. It is useful to have typing when doing XML, we would be stuck in parsing the hell out of what is necessarily a string without it. Especially like the idea of google .proto files which cut down on XSD bloat still maintaining all the richness of typed messages.

    Having worked with Biztalk for some time now, I can honestly say that schemas are convenient way of describing what we are trying to integrate. In summary, keep the bloat required to do typed messages to minimum and avoid any type of "changes in message structure break existing systems" kind of system.   

  • I think XSD can be really useful, especially on very large scale distributed environments.  That said I rarely see people use them well.  To quote Marshall McLuhan: "The message is the medium".  I think there is a lot of value in the intrinsic understanding of a message based on contract.  Even things like content based routing.  I understand message properties as a KVP concept, but that seems to just introduce another sort of coupling and reminds me too much of JMS. 

    I like XSD because you can make a stand alone message with some portable validation and this can be really useful in distributed (and especially occasionally disconnected scenarios).  A SalesOrder schema that doesn't allow negative quantity and dollar amounts is a good example.  My clients should know (both human and machine) that they cannot break these rules.  Throw this into JSON and how do you enforce these basic rules - especially without introducing the dependency of a live service endpoint.  I think the example about date formatting is a good one. 

    Again, this isn't really a defense of XSD per say.  Most XML/XSD I've seen is little more than angle bracket delimited text files.  I like XSD because I have tools that make it easy to work with, when I remember the old days of XML... I want to forget they ever happened - but as we look to build ever more connected systems and aspire to allow them to auto connect I don't see how you can do this without some sort of schema.  Any sort of semantic web approach will require something the machine can use to infer context and meaning.  I do wish it could be easier though. 

  • Maybe this is me coming from WCF land in most of my messaging scenarios but I always like to specify contracts at the interface level not the class level. Implement the interface and specify reasonable functionality for the methods. Things change, db schema on the backend whatever you change the object on the server side and the clients still work because they are still getting IOfWhateverContract.

  • I think that representing messaging data with xsd normally drives you to build highly coupled systems.

    We had a services solution, internally implemented with OOD,  that provided access to the backend. We changed everything to build an "Enterprise Service Bus" solution based in:

    - MSE (http://servicesengine.codeplex.com/)  Data defined in xsd.
    - ESB using dinamics ports, etc.
    - Biztalk 2009  (Maps, Schemas...)
    - WCF Services.


    Results

    - These tools/frameworks push you to express the data with XSD.
    - Biztalk visual studio tools forces you to include the schema inside a dll.
    - Extra job, everything has to be expressed in schemas.
    - Even very little messages needed to be validated against a XSD.
    - Performance is worse than the previous solution (online, mobile..).
    - The architecture is oriented to the tool, no to the business problem.
    - Sometimes you add XSD logic to the business logic, I mean, you can see people asking XSD questions to the business team.
    - The new solution is not more robust, is just bigger (schemas).


    It seems to be a robust system but as you have already explained..

    Consecuences

    - We had services that we wanted to reuse to build some composite services.

    - To reuse service A in service B, a developer took the decision to add the reference of project A schema into B.

    - Weeks later, a very minor change, we wanted to add an optional field for data A schema, so we added the field in the A project and try to deploy.

    - The schema is a resource in the dll project. So, it has dependencies.

    - We had to stop aplication B in all the servers.

    - Was not enough.  We needed to undeploy the application B.

    - Stop all processes.

    - With the system down, the application undeployed, we could deploy the new schema into Biztalk.

    - Deploy again application B.


    Next Steps


    Quick but not best solution  (again.. is no the best solution)

    - Use a generic schema that never changes as a container.
    - Delete the DLL  and use the schema as a file. 

    For the next big release we will try reduce all this XSD coupling to the minimum.  We will build an API based in http, at least not based in data schemas.  We will implemented our validation class based in norm documents o tables.   I agree with your aproach of having a tupla, key, value (or something similar).   I found also interoperability problems with some java framework implementations using multivalue arrays.


    Thanks a lot for sharing

  • I would argue that SOAP introduces coupling through bad implementations, which is much easier to do since SOAP is complex. It sure would make sense if at least .NET's implemenation would resolve and generate SOAP messages as interfaces

  • Freek PaansFreek Paans

    Hi Clemens,

    At around 7:24 into the movie you say that there are a couple of mechanisms how A can get feedback from B (in async communication). Could you elaborate a little bit on that/provide some pointers? I'm looking into that currently and can't seem to find some good write-ups/patterns.

    Thanks!

Remove this comment

Remove this thread

close

Comments Closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums,
or Contact Us and let us know.