Bytes by MSDN: Rob Gillen and Dave Nielsen on Windows Azure Big Compute

Sign in to queue


Join Rob Gillen and Dave Nielsen as they discuss how the scientific community harnesses the cloud for high performance computing. Rob, from Planet Technologies, illustrates how Cloud Computing fits into the areas of computational biology and image processing and shares a neat example of gene mapping using the Windows Azure Blast tool.

About Rob

Rob has spent over 10 years as a Solutions Architect for Planet Technologies. During that time he has spent 7 years working with service providers and hosting companies around the world helping them deploy Microsoft technolgies to deliver services to their customers. He has spent the last three years at Oak Ridge National Laboratory where he is currently a member of the Computer Science Research Group working on solutions for utilizing the cloud for scientific computing.


About Dave

Dave Nielsen, Co-founder of CloudCamp & Principal at Platform D, is a world-renowned Cloud Computing strategy consultant and speaker. He is also the founder of CloudCamp, a series of more than 200 unconferences, all over the world, where he enjoys engaging in discussions with developers & executives about the benefits, challenges & opportunities of Cloud Computing. Prior to CloudCamp, Dave worked at PayPal where he ran the PayPal Developer Network & Strikeiron where he managed Partner Programs.


Rob Gillen and Dave Nielsen recommend you check out



Download this episode

The Discussion

  • User profile image

    I've been curious about cloud computing for a while for academic work. Data parallel is nice, for example hundreds of images that can all be computed in "parallel" (really just a bunch of independent jobs). Here's more of the problem I see with it in anything other than trivial classroom work:

    Most if not all platforms bill you based on data size and communications traffic. Makes sense there is a cost associated with that. But the problem is scientific computing, other than the ideal cases like astro physics simulations where all the data can be created where the simulation runs, has huge amounts of data typically that needs to be sent before work can start and might have to come back too. Most people don't really realize how huge. For example I worked as an IT admin for a genetic research centre. On lab could generate 10TB of raw data a day, and we had 20 labs. If you had to move that to the cloud and then some subset of data back again once processed you'd need a "huge" pipe to do so and the bill for the traffic to a cloud service provider would tend to be prohibitive. We were able to compute the 10TB of data in about 4 hours on 100 nodes to give you a rough idea of the compute time. I think 10TB of traffic would cost more than the 100 nodes for 4 hours.

    Any ideas for a work around for this side of things? In my experience cloud is great if you are serving a website or something where revenue is tied to page views and in general the data size is trivial or at least static (so you move it once and then it changes very slowly) but when you start pushing some real amount of data it is just crazy. We were paying 10k euros a year for 155Mbps symmetric to the institute and we would easily saturate that with just one labs data if we went to the cloud. Fortunately we had an independent 1Gbps link to the local HPC facility and a couple in house smaller clusters (80 and 40 nodes).

  • User profile image

    Oh and unless my math is wrong 155Mbps would take ~6 days to send 10TB assuming it wasn't being used for anything else (which it is as it is the primary internet connection for the building). So time is a factor too. Being able to move huge data is both expensive and likely to take longer than the compute.

Add Your 2 Cents