Posted By: Charles | Mar 3rd, 2006 @ 1:56 PM | 65,262 Views | 22 Comments

This episode features Jim Gray. He is a "Technical Fellow" in the Scaleable Servers Research Group (Sky Server, Terra Server) and manager of Microsoft's Bay Area Research Center (BARC). Jim has been called a "giant" in the fields of database and transaction processing computer systems. In 1998, Jim was awarded the ACM’s prestigious A.M. Turing Award.  

Before joining Microsoft, Jim worked at Digital Equipment Corp (DEC)., Tandem Computers Inc., IBM Corp. and AT&T and he is the editor of the “Performance Handbook for Database and Transaction Processing Systems,” and co-author of “Transaction Processing: Concepts and Techniques.”  In this interview, Jim is joined by former colleague from DEC and partner on the Terra Server project, Researcher, Tom Barclay.

This episode of “Behind the Code” is hosted by Barbara Fox – former senior security architect of cryptography and digital rights management for Microsoft

 

Media Downloads:
Rating:
1
0
Isn't really disappointing and ironic that you have been running Terraserver for almost 10 years without any customer product coming out of it and if it was not for Google and its maps, perhaps it might have stayed that way? Where was MSN all these years?
So ... when can we expect SkyServer to be integrated into Local Live / Virtual Earth? Smiley

BTW, I love the font (Franklin Gothic Book) used in the OP. It's looks real crisp and clean. Channel9 should take care to use these kinds of nice default fonts in the next UI refresh.

...While I'm on the subject, that'd be a nice priority for all Microsoft's online properties. It's a bit baffling how Microsoft spent so much money to create fonts that render well on screen, such as Georgia, yet most sites seem to default to Arial. Many BlogSpot themes, in contrast, use Trebuchet MS. Innovation is great and all, but don't forget to leverage existing investments!
PerfectPhase
PerfectPhase
"This is not war, this is pest control!" - Dalek to Cyberman
Is there another episode somewhere?  This episodes filename was Behind_the_Code3_MBR, the Anders Hejlsberg show was Behind_The_Code_2_MBR where's Behind_The_Code_1_MBR Smiley 
PerfectPhase
PerfectPhase
"This is not war, this is pest control!" - Dalek to Cyberman

nektar wrote:
Isn't really disappointing and ironic that you have been running Terraserver for almost 10 years without any customer product coming out of it and if it was not for Google and its maps, perhaps it might have stayed that way? Where was MSN all these years?


These are more blue sky projects, they are meant to feed technology into the product groups rather than being a product in it's own right.   If you tie something like this to a commercial offering you lose the ability to make the experimental breaking changes.

To say that nothing has come from it is wrong, a lot of the scalability improvements of products like SQL-Server have come from projects like this.

Hello, this is Jim Gray trying to respond to some of the questions.

The stuff on pipelines has quite a literature: "Loading databases using dataflow parallelism" is 10 years old now: http://research.microsoft.com/~gray/papers/Parallel_DB_Load.doc.
a more recent (6-year-old) effort is at http://research.microsoft.com/~gray/river/. but the real action is happening now with things like Google's Map-reduce, Sawzall, and such (you can search for them on Google). Those guys are working with thousands of machines and so are "really" doing it, rather than just talking about it. I think SQL Server 2005 Integration Services is a good way of thinking about dataflow, and  of course BizTalk is a dataflow system, but they are not doing the incredible partition parallelism scaleout (yet) that we need to deal with thousands of machines.

The Kilroy thing is subtle -- which is the point. You are right to be lost.  Sorry. It's subtle. Concurrency is subtle. Avoid it if you can. OK, fair warning.  If you can't resist, there is a longer writeup of it in the book co-authored with Andreas Reuter ("Transaction Processing Concepts and Techniques".)  As Barbara Fox pointed out, it is massive and massively expensive (sorry -- no one is getting rich on it, its just a very small market). Anyway, you can peek at it by going to Amazon and doing a "search inside the book" under Kilroy. Something like: http://www.amazon.com/gp/reader/1558601902/104-0609859-0703169?v=search-inside&keywords=Kilroy

You are right, Barbara Fox is indeed AMAZING.  Jennifer Sisti also deserves HUGE credit for all the research she and Barb put into this event. I had no idea when I got into it that they would make it such a production -- they did. I felt kind of embarrassed to be so off-hand about it when they were so professional.  But... they wanted spontaneity, and they go it.

As for the Terraserver, it was part of Encarta, part of Home Advisor, part of MapPoint, and also a poster-child for web services. It was also a great laboratory for us to try out our scalablity and availabilty ideas. We got a LOT of mileage out of it.  But... now it is part of local.live.com (part of MSN).  Every research guy's dream, the product guys took our reseach toy away from us. Now we have to think up something new for them to "steal" in a 10 years. My one regret is that we had all the AJAX stuff to make the maps very interactive back in 2000, but we did not deploy it because it was IE5+ only. We wanted "reach" to all platforms and so lost the high end.   Now all the other browsers have caught up, and we got leapfrogged.  It's a good lesson. But the Virtual Earth (aka local.live.com) folks are working hard to leapfrog the current leaders. It is fun to watch the innovation in this space. Competition is GREAT!

-------------------------------------------
That's it for now.  I will try to answer the next batch of questions in a few weeks.

I better be quick before someone beats me to it Wink

Good heavens, never thought I'd be reminded by the man himself to clean the dust of that timeless piece.. let me fix my face, the jaw has gone pretty low here Wink

Dear Dr. Gray,

Could never understand how deep all the transactional science can get (until I picked up that incredibly detailed work and lost myself pretty soon), all while it is so abstracted we never see much of it in day to day work or in different, simplified models.. which is greatness no doubt.  So now given myself the task to get that data structure implemented, utilised as well as look for some good testing of few SQL server batteries..

While my lousy opinion is that AJAX is not going to fire away anywhere fast or too succesful (before it is replaced with another name and method at least;-), I believe I  can see where the 'regret' hint is coming from looking at how far ahead MS was back then (and how long it takes for things out of research to resurface in commercial world)..  The issue seems that VML and RDS if that was the correct name and far more were just too much for web pages back in those days. Broadband wasn't taken up as widely,  machines were far slower and storage was still expensive.. Anyway, I just believe that HTML is still slow for highly interactive apps and rendering engines just don't seem to scale with number of visible or out-of-viewport elements.. enter Java hacks etc.. don't see devices coping with much of it either but sure things are getting better slowly.

My favourite comment on the show was on the heat problem as my own teacher always insisted it will have to be hit and pretty soon (his estimates were something like c2010 back in 1997). He always said that's exactly when the algorithm guys (and researchers as he led that department) will finally 'take over' and see great satisfaction and demand for the work they did/do.. Just thought of mentioning it in the context of something most of us mere mortals will never experience or see Wink

To not bore anyone any more, I guess it is a common requirement today to process huge amounts of data (I guess a need for it to be compacted too but another topic I guess). I want to stick to SQL Server (if for nothing else than for many thing said, shown here and the awesome interview).. Sure and for performance, tools and more. And now I'm hitting SQL hard as I can, transaction logs go in 1GB increments in space of a minute and I'm expecting real-world scenario to push that far higher.. therefore anything bound to a single machine, single point is out of question.. easy to say, hard to implement especially as low latency query is a major requirement for the project. I avoid DTC as much as I can and almost always get away (I think;), sure all specific to a problem etc. So ok I now get it has to scale out, it better do, and it has to be distributed because of locking nature of loading large datasets efficiently..

I was always thinking replication and versioning approaches were a way forward for such scenarios.. cache approach I don't have much desire for, IMDB was dumped before Wink, addressable space will keep growing etc.. hence looking for insight from let's face it..

Ok, gotta do this, someone point me where in the world could you publicly ask for advice from a Turing Award legend..

Kudos, am not worth a reply, this site rocks, TM, LTD, etc

All the best.

RJ
RJ
Mr. Gray

    I would first like to extend my gratitude to you for being such a forward thinker, you know forging ahead and being realy creative with the things that you do. I would also like to say a thanks for making the interview an experience. Even through video, I could get the sense that you are great at what you do.

All that aside, I have a few questions:

1. How has problem solving made you a better manager of projects?

2a. Is set theory an innovation to object oriented programming?

2b. In your experience what makes the set theory so effective?

3. How can a developer become more effective or efficient?

4. What is one of the things that excite you about technology?

    It would be an honor to get your feedback, I could perhaps hope to kind of implement the framework that you have created by being such an innovator.

Thank you sir!

Yes, the transactional stuff makes your head hurt, and we are still exploring that space and learning new things.  Vista comes with 3 TMs (Kernel, Light-Weight, and Distributed).  Making them play together and making it all transparent has been a REAL challenge.
It’s a LONG story why there are 3 but each one has a good reason for existence.

The AJAX regret is that the product guys invented it (for Outlook Web Access) and we research guys were the reactionaries.  The regret is that I was retro -- shame on me.   I had good excuses at the time (reach to all platforms) but I was wrong.


Yes Moore's wall (the heat barrier) is going to force us to go parallel. Frankly, we are all stumped how "normals" will program in parallel.  My best hope is something like dataflow (Excel Recalc, SQL parallel Query, Google MapReduce, ... ). But,.. at the moment they all seem like one-trick ponies.   The algorithms guys have been building us libraries, but we need environments not libraries. The parallelism has to be in the outer loop, not the inner loop.

As for the Turing thing and the Legend thing, I confess great embarrassment.   I know how little I know and struggle with Visual Studio and SQL and Win32 just like everyone else.  As you know, programming is really humbling.  I get reminded most every day how really stupid I am. So, I am glad to chat with a fellow programmer.


Jim

RJ

1. Better manager?  Let the record show that my management plan is to hire over-achievers and then ask them to produce monthly reports.   My job is then to keep them from killing themselves next month.  Mostly by giving them pats on the back and telling them that they are accomplishing a LOT.   That's standard Management by Objectives -- and I got it early from my first job selling encyclopedias door to door (we had quotas).   But, I think it is fair to say that I am NOT a good manager -- I do not enjoy it and I tell everyone that.  But I want to work with good people and I want to walk to work most days and so I have to be the manager.

2.  I am not sure I understand the question.  Set Theory predates OO by about 100 years.

3. Set theory is successful because it is simple and indeed it is a way to talk about numbers (including transcendental numbers) and forms the basis for discrete and continuous math and also logic.  The book Gödel-Escher-Bach makes this point quite clearly and is a good read if you have the time.

4.  How can a developer be more effective and efficient?  I think the simple answer is think more write less.  If you are like me, you are lazy and just sit down and write the code.   I really have to force myself to think.  Then the phone rings or an email arrives or some other distraction comes up.  So, thinking is both very hard and at least for me, requires some quiet time -- a scarce commodity these days.

5. What's exciting? My problem is that almost everything interests me.  The challenge is focusing on a FEW things and making a contribution there.   Long term, I think we are on a path to make intelligent life and extend human life indefinitely, and completely change the human condition (Kurtzweil's "the singularity is near".)  That's pretty exciting Jim

Microsoft Communities