In this episode of “Azure Lessons Learned” Rob Fraser from RiskMetrics talks about the work they’ve done with Windows Azure to scale some of their heavy computational workloads out to thousands of nodes on Windows Azure.
RiskMetrics specializes in helping to manage risk for financial institutions and government services. The solution they built on Windows Azure is primarily for calculating financial risk for their clients. Calculating the risk on portfolios of financial assets is an incredibly compute-intensive problem to solve (Monte Carlo simulations on top of Monte Carlo simulations). There is an ongoing and increasing demand for this type of computation. RiskMetrics calculations require enormous computational power but the need for that power tends to come in peaks. That means the required hardware is idle for much of the time. Windows Azure solves this problem by allowing RiskMetrics to quickly acquire the very large number of required processors, use them for a short time and then release them.
To give you a sense of the scale RiskMetrics is talking about, the initial target is to use 10,000 worker roles on Windows Azure. And that’s just a beginning as Rob thinks they could eventually be using as many as 30,000.
While using Windows Azure may help control costs, the real motivation is having the kind of compute power they need to build analytic services for their clients that they just wouldn’t otherwise be able to do easily.
Rob goes in to some depth on the architectural pattern they devised to ensure the efficient flow of work packets from their data center into the cloud for processing and then back again with the results. The architecture is an interesting hybrid of on-premises and cloud computing.
Rob (along with his colleague Phil Jacob) also presented some of this in a PDC session.