ARCast.net - Do Atomic Transactions violate SOA Autonomy Tenet?



[radio break]
Ron Jacobs: Hey, welcome back to ARCast Radio my friends; this is your host Ron Jacobs and today we're going to do another ARCast Rapid Response.

I know, it's like I just did that but I just recorded this one this very morning and it's kind of timely and fresh and it seems like a lot of people are very interested in this question of transactions and web services. You know, in the past, transactional programming with distributed objects was the main way you did things.

Right, in ComPlus and in EJB and other platforms and when we moved to web services we kind of took a little hiccup on that road because initially there was no way to do distributed transactions with web services.

Well all that changed with the introduction of WCF in dot net 3.0 and now you can do them. The question is, is it a good idea, is it not? Well this came up on the MSDN architecture forum yesterday so here's an ARCast Rapid Response.

[radio break]
Ron: This is Ron Jacobs and welcome back to another ARCast Rapid Response. You asked for him, we got him, it's Juval Lowy on the phone with us today. Hey, Juval.
Juval Lowy: Hey, good morning Ron.
Ron: Here's the question that Evan H. has asked, he says, "Are distributed transactions as in the WS Atomic transactions a violation of the autonomous tenant of service orientation?"

He wants to know; yes, no and why. Kudos if you can address concurrency and scalability in an enterprise with multiple interacting services. So he says he has a very strong opinion, he wants to know if any of use will take the bait. So, what do you think Juval?
Juval: Transactions is categorically the only viable programming model. There are no ifs, there are no buts, no...our case is not maybe. This statement means that you cannot actually write or handcraft recovery logic. Any attempt to add more than two lines of code that interact between themselves with a resource is doomed to failure. Nobody in the history of software has ever written any recovery logic that does not result in some kind of transactional mechanism.

Unfortunately, that doesn't mean you can always use transactions. So, what you need to do is you need to compensate for your inability to always use transactions.

So, let me break that statement also. In essence, anytime you can group business operations or any kind of operations into a transaction, more power to you. That means, there's no partial success, no partial failure, no need to recover the system, no need to deal with bazillion, gazillion factorial number of recovery cases.

And by the way, nobody ever does that and of course even if people try to do something, they only did the easy cases, the cases that they know how to deal with. Nobody's ever dealing with disc crashes or power shutdowns and so on.

And that's all good; the problem is that sometimes, business operation takes a long while to complete. Like if you're trying to deposit a check at the bank, you're not going to be you're not going to be at the teller desk for two days until the check clears.

So, what you need to do in that case is do some kind of compensating logic to deal with the fact that you can't really lock down your entire system for two days or two weeks or how long it takes. So, what happens is that systems that don't use transactions typically have some kind of tolerance toward the truth. Meaning, they pretend to actually do something for you, they actually don't, and they cheat.

For example, if you buy an airline ticket, you don't really buy anything. You go to the gate and the flight is overbooked 20%; so let me tell you, some of those customers, were lied to. Now, it could be me, it could be somebody else, so it looks as if there was a business transaction, looks as if money changed hands, looks as if there's ownership of a seat, in fact there isn't.

How many times you buy a book at Amazon and they say, "OK, we'll give you an email, we'll ship it and so on" and 10 days later, you get another email that says, "Look, after all we don't really have the item in the inventory, but we really wanted to; what do you want to do? You want to keep waiting or do you want a refund?"

These are all cases of compensating logic that the companies trying to compensate for the fact it couldn't really have a really long waning transaction. Going back to service orientation, the problem with the grand vision of service orientation is that you would allow anybody to talk to anybody else.

Now, the question becomes what if you have a truly long running transaction? Do you want to allow anybody off the street to maintain a lock on your system for two days or two weeks, how long it takes? And the answer for that is invariably no, in fact the answer for that is invariably no even if you actually trust that guy implicitly.

Now, there's of course then the whole set of nasty scenarios; what happens if somebody does transactions for the sole purpose of locking your resource? But to that I say, DoS - Denial of Service Attack - doesn't really apply here because you should always use security; the moment you authenticate and authorize the call then you implicitly also authorize them to do transactions against you. And if you don't do that, you should and so I don't really buy into the whole denial of service angle with transactions.

So, transactions do not violate the atomicity tenant, they violate the common sense tenant of don't lock your system long if you can't lock your system for long. And that hasn't changed, you don't have to actually use service orientation and doubly safe and sound for doing it.

So, the rule of thumb here is very simple: if you can use atomic transactions, go for it, life is simple, life is good, you will enjoy it, your project will have higher quality, lower issues and so on and so forth. If you cannot then you have to introduce compensation logic and you can only use compensation logic if you have some kind of tolerance to the truth.

If you can't tolerate some kind of temporal inconsistency while you figure out things - by the way, it doesn't matter if you have some tolerance you can only introduce compensation logic if you have some meaningful compensation.

For example, the airline will say, "Anybody wants a flight to Hawaii if they give up the seat now." Well, that's kind of their ability to compensate and they assume that somebody would want to fly to Hawaii, but, what if nobody wants to fly to Hawaii. Well, they have to jack up the price, "Who wants a free flight around the world?" Somebody will eventually pick up the bait, right, that's the real guaranteeing that somebody will compensate for them.

Another angle altogether is sometimes you can look at this whole set of operations you're trying to do and you say, "I can do them individually transactionally, but collectively I cannot. Maybe I can separate them in time. Maybe I can separate them in order.

In fact, how about I actually queue it up against a queue and I make the access to the queue itself be transactional. And then, the whole set is committed or rejected from the queue as one atomic operation but they are being executed out of the queue at whichever order and can fail or succeed independently.

So, that's another twist on it if we can do some kind of logging or queuing or buffering on top of that.
Ron Jacobs: You know, that's actually the approach that I like the most because really what you are trying to do here is keep the transactions inside your boundary. So, you receive a request. You queue it up and then you do the transactional work at a later point in an area that is entirely in your control and you don't have to share the transaction with the other party.
Juval: Correct, and then you can actually have true atomic transactions. Of course the cloud that's been multiple vendors, multiple clients and so on and you queue it all up and you say, "Well this whole set gets committed or aborted. If it's aborted, no biggie, it's all rolling back. If it's committed, then part of it will take place here and some part of it will take place over there and so on." And I agree.
Ron: There's one other thing about this kind of transaction. In a typical distributed transaction, when you roll back everything just disappears. Right?
Juval: Right.
Ron: All of the work that was there is just gone.
Juval: As if it never happened.
Ron: Yes. Of course, you don't want that to happen in an exchange with a third-party. Somebody else, you want to track what happened you know. So they sent you a message. You tried to do something. You couldn't do it. You responded back, "I can't do it." So, you want to have somewhere where you are storing this recorder of this interaction between the two systems.

So, I like to think of those as sort of two separate things. You have this set of messages that was going back-and-forth. That has to be tracked so that if a human later wants to review it and try to fix it, they can. But then you have the transactional work that's applied against your database, your actual resource data. That work you don't want to have retained.
Juval: Correct. In fact, the bullets make it very easy to keep track of things like aborted transactions and number of aborted transactions and so on just by squirting in that config flag that does the tracing for you.
Ron: Right. Yeah.
Juval: So, you can even use that. And another thing that we may want to look at is what attempts were there to try and standardize business transactions as opposed to atomic transactions.

I don't know if you remember the early days of the WS specs with the transaction space. I remember a meeting where all the big vendors were there - IBM, BEA, Microsoft, HP and so on. The discussion on how to do WS atomic transactions was about 20 minutes, OK? It's like everybody knows how to do it. Done deal. Two fists to meet. Done. OK?
Ron: Yeah.
Juval: The discussion on business transactions never ended. Nobody could ever agree on what to do and I think nobody will ever agree because however you compensate, whatever kind of tolerance you have, is bound to be application-specific. It is very difficult to come with a standard common industry wide solution optimized for all cases and all applications that does the compensation.

So, it's even worse than, "You know what? We don't really have a solution as an industry for business transactions." We will never have a solution.
Ron: [laughs]
Juval: That's even a stronger statement. [laughs]
Ron: Well, OK, so here's what often people ask me. If this is such a bad idea, why did they create WS atomic transactions? What do you have to say for that?
Juval: Well, that's what I started. It's not a bad idea. It's not a bad idea if you can get away with it. If your operation is truly short and doesn't log things for more than say a second, go for it. It's a wonderful thing. It's the best thing there is.
Ron: If you really trust the other guy and you're willing to take that risk, sure.
Juval: Yeah, but you know if I don't trust the other guy, I wouldn't allow them pass my security screen. I assume that you have authentication and authorization.
Ron: Yeah. Mm-hmm.
Juval: Now, at some point I envision a world where we are going to have service oriented code access security. So, just like.net is doing code access security for individual components by nobody is ever using it because everybody is running with full trust -
Ron: Yeah.
Juval: But, assuming that we'd use code access security, one of the things that code access security has is a demand for running distribution transaction, which basically is your ability to say everybody up the call chain also has to do transactions against me.
Ron: Yeah.
Juval: Now, imagine a world where we have code access security but at the service layer. So, you can actually have every call containing within the entire logical call stack of all the callers. Similar to Federated Security would have an ability to issue permissions for individual services.

And every time you receive a call, you're saying, "Look, I know that my immediate caller is allowed to do whatever it's trying to do. The question is, "Was its caller authorized to do whatever it's asking my caller to do?" and so on and so forth.

And we just walk this virtual stack and see if every caller at the stack has the ability to do distributional transactions against you. And if the answer is yes, you go ahead. But the industry is not mature yet to do these things. That would also be the ultimate solution, right?
Ron: Well, ultimately it depends on if you trust the transaction manager whoever that is.
Juval: Sure, but I'm saying, if we have a true service oriented code access security, then we solve that.
Ron: Yeah.
Juval: But again, the rule of thumb is if you can use it, definitely do it. It's the only viable programming model. If you have a long-running business transaction and you need compensation, you may not be able to compensate.
Ron: Wow, OK, so this was very interesting because I don't think anybody else who replied to this response had your take on it. So, thanks so much, Juval for this ARCast Rapid Response.
Juval: Thanks, Ron.

[sirens]
Announcer: This has been an ARCast rapid response brought to you by ARCast TV and architect MVPs worldwide.
Ron: You know, that's what I like about Juval, he's always thinking little differently than the crowd, you know? He's not just saying what everybody else is saying. And of course Juval is like Mr. Transactions.

I first met Juval actually when I was on the Complus team. And he was writing a book about Complus and transactions. And he has been on the forefront of thinking about system dot transactions and those sorts of things. So, it's great to be able to bring this kind of expertise to you, my listening friends.

I hope you enjoyed that and these ARCast rapid responses. You know, ask your question on the MSDN Architecture General Forum. We might just select it for an ARCast rapid response. I hope that you're enjoying these and let me know what you think. Send your questions or comments to ARCast@Microsoft.com and I'll be happy to respond to them.

Hey, keep listening. Tell your friends and we'll see you next time on -