Thanks for the excellent reply. I'm still a little hazy on the details, thought.
The load balancing of MSMQ is performed by the WCF client, which will round-robin against servers that are running the OrderProcessor service host. This is built into the StockTrader AsyncOrderClient itself.
So, if I were to duplicate this without WCF, there would be code that essentially would select a different queue path depending for each request, thereby kinda load balancing. Fair enough, but how does this resolve the situation where the path you select
points to a machine that is down?
Due to the nature of MSMQ, a local message will be queued and will wait in the outgoing queue until the target machine comes back online. This provides resiliency, but not really fault tolerance and failover, because now I have an orphaned message that may
be stuck in the outgoing queue forever if the target machine never comes back.
With WCF, actually, when using an MSMQ binding, the client app is not directly communicating with the OrderProcessor.exe service host; rather it is talking to MSMQ.
There is a fairly well-known work around for the remote read/distributed tx issue with MSMQ 3.5, which calls for creation of a polling mechanism that essentially transfers the message to the local processing computers local MSMQ as part of a distributed
tx; which then reads them locally and processes as part of another distributed tx.
Interesting. So you would have a central queue server that all message senders would send messages to. This queue server could be clustered and therefore is fault tolerant and can be failed over. Now this "polling mechanism", which would have to be running
locally on the central queue server, would take the messages and send them to the target servers which are running the OrderProcessor.exe service. These servers would be able to then do local reads / transactions to process the messages. Is this correct?
If so, I still am confused as to how this is any different than before. You still have the problem of ensuring the the polling mechanism is only sending to servers that "up". Since MSMQ naturally abstracts this entire process away from you, you're still stuck
with the "orphaned message" scenario if the target machine never comes back and the message is sitting in the outgoing queue.
Now, I suppose we could kinda of solve this issue by simply using TimeToRecieve/TimeToSend timeouts on the sending machines. If the message times out, it would be placed in a dead letter queue and we could implement a service that then selected a new server
to send the message to. Is that the strategy you use? Sorry if WCF resolves these issues... I'm not familiar with WCF. (Yet. )
Also, MSMQ will work with Windows Network Load Balancing; there are some good articles on MSDN about this.
It is my understanding that MSMQ works with NLB only if you're not doing transactional messaging. The session aspect of MSMQ prevents NLB from properly routing all the various packets to the proper machines since NLB is a network-level load balancing mechanism,
not an application load balancer which understands the MSMQ protocol. If I'm wrong, I'll be very happy since this is a problem I've dealt with for a while.
I really appreciate you taking the time to explain these things to me. I feel like I've read all the available documentation and still have a lack of understanding.
Since you're using transactions there is no way, at least in MSMQ 3.0, to do remote transactional reads. This means that you can't have your processing apps on multiple servers and have them all connect to a single MSMQ server.
Futhermore, you can't have multiple MSMQ servers acting as a single load balanced cluster because MSMQ transactions require what amounts to a session. This prevents load balancing of queues using things like virtual IPs or hardware load balancers.
The best you can do is setup an MSMQ Windows Cluster, but that's active/passive cluster and it just gives you fault tolerance, not horizontal scalability. Not to mention the fact you'll need shared storage in the form of an SAS array or a NAS array.
So how do you scale Stock Trader's MSMQ queue server? I read the documentation, and it seems to only mention scaling the UI and business logic layers. It completely glosses over the server(s) hosting the target MSMQ queue.
I have a complex enterprise application that heavily utilizes MSMQ, and scaling this tier has proven to be basically impossible. There is always a node that won't horizontally scale. Because of this, I'm considering moving to something like SQL Server Service