Conversation with scientist, engineer and database legend Jim Gray

Play Conversation with scientist, engineer and database legend Jim Gray

The Discussion

  • User profile image
    Someone changed the graphic associated with this video; I thought it was a Part 2.  Was the other pic not good enough?
  • User profile image
    Random change. There's no hidden meaning...
  • User profile image
    That is too cool. I am working on a pipe-line server like this.  It is a nice way to go.  There is some potential issues however.  Each "stage" has a thread or a thread pool with some max.  If one of the sync threads gets blocked for extra long (i.e. network delay, error, hack, etc), more worker thread(s) spin-up for the stage. So far so good.  But on a busy server with a lot of connections, it is possible to max the workers for a stage and block the whole server. Naturally, this could happen even if the stage was total async as well. Eventually memory/resources would run out posting callbacks.  Still, I think I like this sync pipe-line better.  Here are some interesting works on related design:

    Does MS (e.g. Jim) have any papers on same?  TIA
  • User profile image
    Deactivated User

    Comment removed at user's request.

  • User profile image

    Interesting people.

    Nice interview.

  • User profile image

    I did not follow most of what he was talking about..


    But the last bit, was about some bug in the MultiThreaded application model or something...


    I did not understand what he was getting at. perhapse one of you can explain to me what the heck he was getting at because i am confused Perplexed

  • User profile image
    the last bit was a data structure, he talked about concurrent bugs was that what confused you?

    now i know what a killroy is Smiley thanks

    loved the show, best c9 vid yet,
  • User profile image

    d/ling again just for the last bit - arrgh

  • User profile image
    Amazing video, i think that "Behind the code" is one of the greates shows ever (not only in Channel 9) Wink.

    Question: exists somewhere on the web a good place to learn about the "Free Pool" data structure?

  • User profile image
    Isn't really disappointing and ironic that you have been running Terraserver for almost 10 years without any customer product coming out of it and if it was not for Google and its maps, perhaps it might have stayed that way? Where was MSN all these years?
  • User profile image
    So ... when can we expect SkyServer to be integrated into Local Live / Virtual Earth? Smiley

    BTW, I love the font (Franklin Gothic Book) used in the OP. It's looks real crisp and clean. Channel9 should take care to use these kinds of nice default fonts in the next UI refresh.

    ...While I'm on the subject, that'd be a nice priority for all Microsoft's online properties. It's a bit baffling how Microsoft spent so much money to create fonts that render well on screen, such as Georgia, yet most sites seem to default to Arial. Many BlogSpot themes, in contrast, use Trebuchet MS. Innovation is great and all, but don't forget to leverage existing investments!
  • User profile image
    Is there another episode somewhere?  This episodes filename was Behind_the_Code3_MBR, the Anders Hejlsberg show was Behind_The_Code_2_MBR where's Behind_The_Code_1_MBR Smiley 
  • User profile image

    nektar wrote:
    Isn't really disappointing and ironic that you have been running Terraserver for almost 10 years without any customer product coming out of it and if it was not for Google and its maps, perhaps it might have stayed that way? Where was MSN all these years?

    These are more blue sky projects, they are meant to feed technology into the product groups rather than being a product in it's own right.   If you tie something like this to a commercial offering you lose the ability to make the experimental breaking changes.

    To say that nothing has come from it is wrong, a lot of the scalability improvements of products like SQL-Server have come from projects like this.

  • User profile image
    Jim Gray

    Hello, this is Jim Gray trying to respond to some of the questions.

    The stuff on pipelines has quite a literature: "Loading databases using dataflow parallelism" is 10 years old now:
    a more recent (6-year-old) effort is at but the real action is happening now with things like Google's Map-reduce, Sawzall, and such (you can search for them on Google). Those guys are working with thousands of machines and so are "really" doing it, rather than just talking about it. I think SQL Server 2005 Integration Services is a good way of thinking about dataflow, and  of course BizTalk is a dataflow system, but they are not doing the incredible partition parallelism scaleout (yet) that we need to deal with thousands of machines.

    The Kilroy thing is subtle -- which is the point. You are right to be lost.  Sorry. It's subtle. Concurrency is subtle. Avoid it if you can. OK, fair warning.  If you can't resist, there is a longer writeup of it in the book co-authored with Andreas Reuter ("Transaction Processing Concepts and Techniques".)  As Barbara Fox pointed out, it is massive and massively expensive (sorry -- no one is getting rich on it, its just a very small market). Anyway, you can peek at it by going to Amazon and doing a "search inside the book" under Kilroy. Something like:

    You are right, Barbara Fox is indeed AMAZING.  Jennifer Sisti also deserves HUGE credit for all the research she and Barb put into this event. I had no idea when I got into it that they would make it such a production -- they did. I felt kind of embarrassed to be so off-hand about it when they were so professional.  But... they wanted spontaneity, and they go it.

    As for the Terraserver, it was part of Encarta, part of Home Advisor, part of MapPoint, and also a poster-child for web services. It was also a great laboratory for us to try out our scalablity and availabilty ideas. We got a LOT of mileage out of it.  But... now it is part of (part of MSN).  Every research guy's dream, the product guys took our reseach toy away from us. Now we have to think up something new for them to "steal" in a 10 years. My one regret is that we had all the AJAX stuff to make the maps very interactive back in 2000, but we did not deploy it because it was IE5+ only. We wanted "reach" to all platforms and so lost the high end.   Now all the other browsers have caught up, and we got leapfrogged.  It's a good lesson. But the Virtual Earth (aka folks are working hard to leapfrog the current leaders. It is fun to watch the innovation in this space. Competition is GREAT!

    That's it for now.  I will try to answer the next batch of questions in a few weeks.

  • User profile image

    I better be quick before someone beats me to it Wink

    Good heavens, never thought I'd be reminded by the man himself to clean the dust of that timeless piece.. let me fix my face, the jaw has gone pretty low here Wink

    Dear Dr. Gray,

    Could never understand how deep all the transactional science can get (until I picked up that incredibly detailed work and lost myself pretty soon), all while it is so abstracted we never see much of it in day to day work or in different, simplified models.. which is greatness no doubt.  So now given myself the task to get that data structure implemented, utilised as well as look for some good testing of few SQL server batteries..

    While my lousy opinion is that AJAX is not going to fire away anywhere fast or too succesful (before it is replaced with another name and method at least;-), I believe I  can see where the 'regret' hint is coming from looking at how far ahead MS was back then (and how long it takes for things out of research to resurface in commercial world)..  The issue seems that VML and RDS if that was the correct name and far more were just too much for web pages back in those days. Broadband wasn't taken up as widely,  machines were far slower and storage was still expensive.. Anyway, I just believe that HTML is still slow for highly interactive apps and rendering engines just don't seem to scale with number of visible or out-of-viewport elements.. enter Java hacks etc.. don't see devices coping with much of it either but sure things are getting better slowly.

    My favourite comment on the show was on the heat problem as my own teacher always insisted it will have to be hit and pretty soon (his estimates were something like c2010 back in 1997). He always said that's exactly when the algorithm guys (and researchers as he led that department) will finally 'take over' and see great satisfaction and demand for the work they did/do.. Just thought of mentioning it in the context of something most of us mere mortals will never experience or see Wink

    To not bore anyone any more, I guess it is a common requirement today to process huge amounts of data (I guess a need for it to be compacted too but another topic I guess). I want to stick to SQL Server (if for nothing else than for many thing said, shown here and the awesome interview).. Sure and for performance, tools and more. And now I'm hitting SQL hard as I can, transaction logs go in 1GB increments in space of a minute and I'm expecting real-world scenario to push that far higher.. therefore anything bound to a single machine, single point is out of question.. easy to say, hard to implement especially as low latency query is a major requirement for the project. I avoid DTC as much as I can and almost always get away (I think;), sure all specific to a problem etc. So ok I now get it has to scale out, it better do, and it has to be distributed because of locking nature of loading large datasets efficiently..

    I was always thinking replication and versioning approaches were a way forward for such scenarios.. cache approach I don't have much desire for, IMDB was dumped before Wink, addressable space will keep growing etc.. hence looking for insight from let's face it..

    Ok, gotta do this, someone point me where in the world could you publicly ask for advice from a Turing Award legend..

    Kudos, am not worth a reply, this site rocks, TM, LTD, etc

    All the best.

  • User profile image
    Mr. Gray

        I would first like to extend my gratitude to you for being such a forward thinker, you know forging ahead and being realy creative with the things that you do. I would also like to say a thanks for making the interview an experience. Even through video, I could get the sense that you are great at what you do.

    All that aside, I have a few questions:

    1. How has problem solving made you a better manager of projects?

    2a. Is set theory an innovation to object oriented programming?

    2b. In your experience what makes the set theory so effective?

    3. How can a developer become more effective or efficient?

    4. What is one of the things that excite you about technology?

        It would be an honor to get your feedback, I could perhaps hope to kind of implement the framework that you have created by being such an innovator.

    Thank you sir!
  • User profile image
    Jim Gray

    Yes, the transactional stuff makes your head hurt, and we are still exploring that space and learning new things.  Vista comes with 3 TMs (Kernel, Light-Weight, and Distributed).  Making them play together and making it all transparent has been a REAL challenge.
    It’s a LONG story why there are 3 but each one has a good reason for existence.

    The AJAX regret is that the product guys invented it (for Outlook Web Access) and we research guys were the reactionaries.  The regret is that I was retro -- shame on me.   I had good excuses at the time (reach to all platforms) but I was wrong.

    Yes Moore's wall (the heat barrier) is going to force us to go parallel. Frankly, we are all stumped how "normals" will program in parallel.  My best hope is something like dataflow (Excel Recalc, SQL parallel Query, Google MapReduce, ... ). But,.. at the moment they all seem like one-trick ponies.   The algorithms guys have been building us libraries, but we need environments not libraries. The parallelism has to be in the outer loop, not the inner loop.

    As for the Turing thing and the Legend thing, I confess great embarrassment.   I know how little I know and struggle with Visual Studio and SQL and Win32 just like everyone else.  As you know, programming is really humbling.  I get reminded most every day how really stupid I am. So, I am glad to chat with a fellow programmer.


  • User profile image
    Jim Gray


    1. Better manager?  Let the record show that my management plan is to hire over-achievers and then ask them to produce monthly reports.   My job is then to keep them from killing themselves next month.  Mostly by giving them pats on the back and telling them that they are accomplishing a LOT.   That's standard Management by Objectives -- and I got it early from my first job selling encyclopedias door to door (we had quotas).   But, I think it is fair to say that I am NOT a good manager -- I do not enjoy it and I tell everyone that.  But I want to work with good people and I want to walk to work most days and so I have to be the manager.

    2.  I am not sure I understand the question.  Set Theory predates OO by about 100 years.

    3. Set theory is successful because it is simple and indeed it is a way to talk about numbers (including transcendental numbers) and forms the basis for discrete and continuous math and also logic.  The book Gödel-Escher-Bach makes this point quite clearly and is a good read if you have the time.

    4.  How can a developer be more effective and efficient?  I think the simple answer is think more write less.  If you are like me, you are lazy and just sit down and write the code.   I really have to force myself to think.  Then the phone rings or an email arrives or some other distraction comes up.  So, thinking is both very hard and at least for me, requires some quiet time -- a scarce commodity these days.

    5. What's exciting? My problem is that almost everything interests me.  The challenge is focusing on a FEW things and making a contribution there.   Long term, I think we are on a path to make intelligent life and extend human life indefinitely, and completely change the human condition (Kurtzweil's "the singularity is near".)  That's pretty exciting Jim

  • User profile image
    Hi Jim,

    Thanks to you and Barbara for such earnestness in the video.

    Do you think Microsoft as a company is reaching a Digital Equipment Corporation moment in its history?  How would you say Microsoft now compares to DEC when you arrived there?

    I note warmly in your posts that you say "competition is great."  As a consumer of various software companies' products, I could not agree more.  Perhaps the competition will spur Microsoft to save itself.

    Finally, I cannot stop myself from asking -- though I know full well that you may not feel comfortable answering, and in that sense I am imposing on you just by voicing the question -- how do you feel about Steve Ballmer?  I think he may have literally gone over the edge and become insane.  Which both saddens and scares me, but I guess "it is what it is" as they say.
  • User profile image
    Jim Gray

    Hi J.

    MS == DEC? Microsoft is huge (65k people), and so it is different from the 10 person, 100 person, 1,000 person, and 10,000 person groups that it grew from.   Organizations that large have dysfunctional parts, just like people have dysfunctional parts.   It comes with complexity.   Is it like DEC (where I worked for 4 years in the early 90's) or IBM (where I worked in the 70's)?  No!  It is very different.   Part of the difference is that it is still growing fast; that covers a multitude of sins and engenders optimism.  Part of the difference is that upper management is still in touch with the technology and the business (Ken Olson and John Akers were not). Microsoft has had many near-death experiences (OS2, NetWare, WordStar, Lotus, Mac, Netscape, AOL, Linux, Google, ...). It lives on paranoia -- most of the folks I work with know that if we do not innovate, we will not be working together in a few years.   Those are big differences. 

    Ballmer crazy?  Steve Ballmer, is he insane? I think not.  First appreciate that Steve graduated from Harvard in mathematics.  So his IQ is probably higher than yours or mine (I am told he played poker all through school and won.)  I couldn't even get into Harvard.  I majored in math and did OK at Berkeley (but I had to study, no time for poker.) Harvard math is HARD.    OK, so Steve was smart once.    Next fact, Steve is very involved with his family -- his kids, his wife, and his friends.  He is a billionaire but he is very earthy and personable. This is not an act -- he genuinely cares about people.  You would love to have him as a next door neighbor or as a pal.  OK, so how can one possibly explain his strange behavior at Microsoft marketing events (e.g. the jumping monkey and such)?  Well, remember the part about playing poker?  Steve can bluff, Steve can act, and Steve LOVES to win -- he is a competitive animal ("our fair share of the OS business is 100% and it up to our competitors to deny us our fair share and it is up to us to build products that merit that fair share.")  So, suppose you are going to a marketing event and you want to get your audience's attention -- you want to energize them.   How are you going to do it?   Doing the monkey dance is one way. Have you got a better idea?    On a related story, suppose you are Bill Gates and one of your senior techies comes to you with a not very well thought out idea.   If that person is a master-of-the-universe arrogant testosterone Microsoft techie who is a big wheel in his organization and takes no guff from his underlings and peers, how are you going to get his attention?    Sad to say, polite comments will not penetrate -- unfortunately you have to be incredibly loud, rude, and blunt just to get the message through.   I bet you have heard such stories.  But, with “normals” Bill is a real gentleman.

  • User profile image

    Will be a nonsense post from me as usual but perhaps something useful for my record (I keep this list close-by) of 'Top 10, pointless, time-wasting failures' while developing software..

    It took a while, can only imagine how busy everyone over there is, but reading all the Qs and replies made my day and presents an interesting flow of reasoning perhaps.  And at least I can show off now, point to this site as evidence when some 'silly' argument develops in  pointilistic-culture-friendly company all my collegues work in Smiley

    Thanks very much for your time Dr. Gray and for those little hints that keep the mind hungry and, how to put it, just sweet enough to question everything, try something different all the time and hopefully invoke change and more.

    On parallelism, it seems a lot of new and old MSFT VS guys seem to be occupied enough with it for us to anticipate that is where the next edge will be for quite some time now (apart from other VM work I see on this site but have no time to view videos of or read about). Might seem a no-brainer to many (they say ignorance is a bliss, but competativness on milliseconds timescales isn't one Wink, and to write code (or to have an intuitive-enough platform) for such environment is surely couple of orders of magnitude harder to build than just looking to avoid deadlocks or selecting locking granularity that might be optimal for some application. It's like, err, having a terrifyingly humble computing legend around, not many people about that can be and remain that way Wink Thus quite likely not many people will program parallel (including myself) for a long time either.

    So what's left.. personally, I don't believe much in generalisations or sticking to a single environment for that matter. Reason probably being our ideas change allthe time or that everyone is psd off with everyone else, it is only natural to be moody and try 'your own thing' TM. What strikes me in this day and age, or wisening process I wish/hope, is to make an advance no matter what the method, utilise it and make the money before someone else does the same. This in turn ideally envokes the change for greater good, like health, like eliminating poverty and more (enter notes on Balmer and Bill which were all enlightening and beautiful read at the very least).

    We perhaps don't need an environment initially, just constructs to show the results and the rest (any generalisation if there is a need for them) should appear from it. The good old 'responsibility pushed to the user'. What I am aiming at (after some layman-type  rumblings with myself) is a thought that in order for parallelism to really work is to have benefit from it, timely and more accurate data; all the software or platforms etc in the chain, or all interaction, are ideally designed with it in mind. Thus I find it is no surprise to see very little commercial bits available, esp where money is milliseconds, and even when some are found (like was it cambridge STL bits I believe), they are useless because there is an integration with a system unable to benefit from it.

    Enter queues. But also enter the fact that no generalisation can be found for multi-writer, multi-reader (thread is the wrong term, parallel suits better I guess) scenarios. That + logic to assemble it all in such fashion that the 'serial' workhorse (historical-term:Viper extension) might get its logic (our apps) work done on just the dataset that is relevant, ie. the latest 'shapshot' is nowhere around.

    I like to draw parallels to real-life (ie. no machines exist) all the time; information that is old is no longer useful for a good number of new ideologies or applications, about time someone deals with it yet it is almost a revelation in 2006, say Windows Mobile 2005 devices picking up email. As an example, at this time I am watching Intel's stock break out after a terrifying blow for almost 2 years now. What happened there is anyone's guess but my snapshot is only interested in whatever the arbitration logic decides is relevant at a point in time it chews the input, not necessarily the past (ie. almosty like queue filtering with some context data to help keep the search/sort operations from parallel input down or 'temporalised' ). Sure the workhorse (inner loop) can utilise further 'tool or software helped' parallel processing if it is built that way.

    The rest in my mind has little benefit from parallel execution we like it or not and sure, algorithms and software tools can identify such things but will it work all the time, ie are these bits inner loops, and what headache will it give to software developer? Your comment on one-trick ponnies seems to be on similar lines (unless I am terribly confused, which isn't an issue here Smiley

    Much the same the hardware guys like DEC kicked off and those bits sold to was it Compaq then Intel etc, ie. Alpha architecture or at least ideas from it. Thus, I believe all we need is some good hardware abstractions in forms of MWMR queues, extension points for our state/temporal/filtering logic, and (hate the term btw) 'pushed' and 'high-res versioned' data to help us build those. Sounds easy in this quick nonsense write up of mind, but probably hard enough to even begin with for most. Form-type, history and other 'audit-friendly' apps can wait, they had priority for far too long and they caused no revolution apart from WWW, blogs and Google, which is not small but no Holy Grail either Smiley

    In any case, I gave up on big data storage (and compression of such) idea as reliability bits came into play (off goes the Morse&Isaac,Snodgrass etc out of my short lifespan:). Life gives such incredible reasons to not be obsessed with detail or clutter, yet all programmers fall for and fall in love with it.

    Many thanks for your reply and time.

    'Optimisation is the root of all good' convert

  • User profile image
    Jim Gray is cool.

Add Your 2 Cents