<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" media="screen" href="/App_Themes/default/rss.xslt"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:evnet="http://www.mscommunities.com/rssmodule/"><channel><title>jonudell</title><atom:link rel="self" type="application/rss+xml" href="http://channel9.msdn.com/posts/jonudell/rss/default.aspx" /><image><url>http://mschnlnine.vo.llnwd.net/d1/Dev/App_Themes/C9/images/feedimage.png</url><title>jonudell</title><link>http://channel9.msdn.com/posts/JonUdell/</link></image><description>Channel 9 Blog for JonUdell</description><link>http://channel9.msdn.com/posts/JonUdell/</link><language>en-us</language><pubDate>Mon, 13 Oct 2008 16:29:06 GMT</pubDate><lastBuildDate>Mon, 13 Oct 2008 16:29:06 GMT</lastBuildDate><generator>EvNet (EvNet, Version=1.0.3608.3122, Culture=neutral, PublicKeyToken=null)</generator><item><title>Derik Stenerson on the past, present, and future of the iCalendar specification</title><description>&lt;p&gt;
Derik Stenerson first came to Microsoft on an internship as a Test Engineer on Microsoft Mail. After graduating, he joined Microsoft full time in the email group and worked in various roles on email and scheduling products, including Schedule+ and Exchange. His passion for calendaring and scheduling led to work on the iCalendar standard (IETF RFC 2445) and later on a hosted self-service scheduling solution for small businesses. For the past few years Stenerson has been exercising his other passion for user centered design while building features for Microsoft Dynamics CRM.
&lt;/p&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;
            &lt;img src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/stenerson/stenerson.jpg" /&gt;
            &lt;div&gt;
            &lt;strong&gt;Derik Stenerson&lt;/strong&gt;
            &lt;/div&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;p&gt;
Next month marks the tenth anniversary of RFC 2445. To celebrate the occasion, Derik joins Jon Udell on &lt;a href="http://itc.conversationsnetwork.org/shows/detail3860.html"&gt;Interviews with Innovators&lt;/a&gt; to discuss the past, present, and future of the venerable iCalendar specification.
&lt;/p&gt;&lt;img src="http://channel9.msdn.com/489765/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Derik-Stenerson-on-the-iCalendar-specification/</comments><link>http://channel9.msdn.com/posts/JonUdell/Derik-Stenerson-on-the-iCalendar-specification/</link><pubDate>Mon, 13 Oct 2008 13:30:00 GMT</pubDate><guid isPermaLink="false">http://channel9.msdn.com/posts/JonUdell/Derik-Stenerson-on-the-iCalendar-specification/</guid><evnet:views>673</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489765/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;p&gt;
Next month marks the tenth anniversary of RFC 2445. To celebrate the occasion, Derik Stenerson -- one of the original authors -- joins Jon Udell on &lt;a href="http://itc.conversationsnetwork.org/shows/detail3860.html"&gt;Interivews with Innovators&lt;/a&gt; to discuss the past, present, and future of the venerable iCalendar specification.
&lt;/p&gt;</evnet:previewtext><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Derik-Stenerson-on-the-iCalendar-specification/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489765/Trackback.aspx</trackback:ping><category>icalendar</category><category>Standards</category></item><item><title>Scott Prevost explains Powerset's hybrid approach to semantic search</title><description>&lt;p&gt;
Scott Prevost is General Manager and Director of Product for Powerset, the company whose semantic search engine was recently acquired by Microsoft. In this interview he describes the history of Powerset's natural language engine, and explains how it works as part of a hybrid approach to indexing, retrieval, and ranking.
&lt;/p&gt;
&lt;p&gt;
Scott will expand on these topics in his keynote address at &lt;a href="http://www.web3event.com/"&gt;Web 3.0&lt;/a&gt; in October.
&lt;/p&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;img src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/prevost/prevost.gif"&gt;
&lt;div&gt;
&lt;b&gt;Scott Prevost&lt;/b&gt;
&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: The notion of search enhanced by natural language understanding has a long history. I was just reading Danny Sullivan's rant about how he's been hearing about this for years, but it's never amounted to anything.&lt;/p&gt;

&lt;p&gt;Of course, people are all over the map on this topic, but nonetheless you guys are doing certain demonstrable things, and working on other things. So I'd like to find out more about how the technology -- which was acquired from Xerox, where it had been worked on for a long time -- actually works. What you mean by natural language understanding, how you're applying the technology, and where this is going.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: Well, there are a lot of questions tucked in there, but maybe we can start with what we licensed from PARC, what was formerly Xerox PARC. They had been working for 30 years in a linguistic framework called LFG -- &lt;a href="http://www.powerset.com/explore/semhtml/Lexical_functional_grammar"&gt;lexical functional grammar&lt;/a&gt; -- and they built a very robust parser. It's probably parsed more sentences than any other parser in the world. &lt;/p&gt;

&lt;p&gt;What it allows us to do is take apart every document that we index, sentence by sentence, uncover its linguistic structure, and then translate that into a semantic representation we can encode in our index.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Can you confirm or deny something that Danny Sullivan reported, which is that it takes on the order of two months to index Wikipedia one time using this method?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: [laughs] That's a very, very old number. It all depends on the number of machines, but we do it now on the order of a couple of days.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: And it scales linearly?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: Yes. And in fact we're working really hard to bring those numbers down. We have a very small data center right now. We're looking at what it takes to stand up a 2 billion document index, and it's absolutely attainable.&lt;/p&gt;

&lt;p&gt;I think Danny Sullivan realized, when he wrote another article on the day we launched, that we're doing something different. He called us an understanding engine. It's not the case that we're just applying linguistic technology at runtime, by parsing the query and then trying to use the same old kind of keyword index for retrieval. We're actually doing the heavy lifting at index time.&lt;/p&gt;

&lt;p&gt;We're actually reading each sentence in the corpus, pulling out semantic representations, indexing those semantic representations, and then at query time we try to match the meaning of the query to the meaning of what's in the document. That allows us to both increase precision and improve recall.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: When you say semantic representation, what it means -- or anyway what's evident in the current version -- is subject/verb/object triples, basically. That seems to be how things are organized.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: That's one small part of what the engine does. It's the part we've exposed in the user interface in a very direct way. But actually those are only three of several dozen semantic roles that we uncover at index time, and all of those roles go into selecting documents, and snippets of documents, when we present the organic results. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Really? So even though the patterns aren't exposed in the advanced query interface, they're still used?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: That's right, they're still used.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: What would be an example of one of those other patterns, and how it's applied?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: So, you ask a question like: "When did Hurricane Katrina strike?" The 'when' is a certain kind of semantic role that we've indexed, separately from the subject, verb, and object. There are a number of other roles like that: location, time, other types of relationships.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: I saw a private demo, about a year ago, in which one of the most striking examples was something like: "Companies acquired by IBM between 1996 and 2003". At that point, I think the light bulb goes on in people's heads about what this could really be.&lt;/p&gt;

&lt;p&gt;That class of query isn't exposed yet, but it's an example of what's possible, right?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: Absolutely. That's exactly the direction we're moving in. Initially most of the work we've done has been on the index side. Now we're starting to catch up on the query side, which allows us to complete the loop and do queries like that.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: The other piece that's visible on the website, in addition to the Wikipedia stuff, is the Freebase material that you've recently integated. That's an interesting case because there you can pull semantics directly from Freebase. So this becomes more of a query-time interface to something which is already structured and understandable.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: Yeah, that's right. Freebase is kind of like Wikipedia, except it's all structured data. Unlike with our core technology, which turns unstructured data into structured data, with Freebase we just go directly to the structured data. But it uses the same linguistic technology on the front end to parse the query, which then gets mapped into a Freebase database call.&lt;/p&gt;

&lt;p&gt;But by using linguistic technology to parse the query, we're able to match very flexible ways of saying things. We don't have to imagine every possible way someone might ask for a particular piece of information. The linguistic engine takes care of a lot of that for us.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: That's why I can type in something like "Barack Obama's book" and get back the answer &lt;i&gt;Dreams From My Father&lt;/i&gt; directly from Freebase.&lt;/p&gt;

&lt;p&gt;So, what was the intent of including Freebase along with Wikipedia. What are you trying to show there?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: That the linguistic technology can be used with both structured and unstructured data. Freebase just has a lot of really great information.&lt;/p&gt;

&lt;p&gt;One of the things about a natural language front end is that it encourages people to ask questions and expect answers. With the Freebase database, it's pretty easy to provide direct answers right at the top of the search results page, which users find to be a nice experience.&lt;/p&gt;

&lt;p&gt;Of course you have to be very high-precision, so we've tuned the Freebase stuff for precision rather than recall.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Tell me about the natural language landscape: the variety of approaches that exist, the style that you're using, how that compares to others, how all this fits into the history of the technology.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: The technology goes back a long way, three decades or so. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Longer, actually.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: Yeah, really since the beginnings of AI people have been trying to use computers to understand and generate language. There have been a number of different approaches: purely symbolic approaches, statistical approaches. We really use a hybrid. &lt;/p&gt;

&lt;p&gt;The Xerox technology uses a particular grammatical formalism, and we do use symbolic approaches to our semantic rules. But we also then put these semantic features into our index, and use machine learning and statistical approaches to retrieve and rank results. &lt;/p&gt;

&lt;p&gt;It really is a combination. We try not to be religious about these things, but just use best of breed, and choose the right tools for the jobs we're doing. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: One of the things that Peter Norvig at Google is always saying is that the real secret to their success is vast quantities of data, and that in the end you don't really need AI, you just need lots and lots of data and the ability to crunch through it.&lt;/p&gt;

&lt;p&gt;I assume you would argue that the natural language techniques are also helpful, and that as the quantity of data in your possession grows, the power that it brings to the table will also grow.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: Yeah. One thing we try to do with the natural language technology is give a leg up to the statistical and machine learning approaches. If you look at a search engine that just uses keywords, the information you have about the page is pretty slim. &lt;/p&gt;

&lt;p&gt;We're trying to capture more information about each page that we index, which enhances our ability to retrieve and rank. For example, it allows us to retrieve documents where there are no keyword matches, but there's a good meaning match.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: For example?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: So, consider a query like: "What politicians were killed by disease?" Powerset will retrieve documents that don't include the words 'disease' or 'politician' or 'kill', but that are about particular politicians who died from particular diseases.  &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Is the process of mapping generic terms to specific instances a hybrid of human editorial effort and statistical techniques?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: Yeah. We use things like &lt;a href="http://wordnet.princeton.edu/"&gt;WordNet&lt;/a&gt;, which is a giant dictionary or thesaurus of the English language that shows how various word senses relate to each other. We use that with some editing on our own. We also use machine learning techniques to figure out some word relationships, and which are most helpful in retrieval and ranking. So it really is a combination. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: When did you start this work?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: The company was founded three years ago, and I joined two years ago. But of course the work at PARC goes back 30 years.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: You obviously have an academic background in this field.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: Yeah, I have a Ph.D. in computation linguistics, as do probably about twenty other people at Powerset. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: What's your take on how this engine will start to surface through the various Microsoft online properties?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: The two areas where we can make a big impact are, first of all, improving core relevance, which is an absolute must for every search engine. And then also the user experience. Some of the technology -- and you start to see it in the Wikipedia search engine that we put out -- some of it really allows us to do different things in the presentation of these results. Thing that can save the user time, by getting the answer right on the search results page. &lt;/p&gt;

&lt;p&gt;Our goal is to continue to work on improving relevance, and we've shown that by using these semantic features we can drive large relevance improvements, but there's still a lot of work to be done there.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: In that case, the improvements would be under the covers, the person using Live Search wouldn't know that you were contributing to the relevance of the result.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: That's correct. Another way it can happen is by creating a different quality of snippet or caption, things that highlight the parts that match the query instead of just bolding the keywords. Actually highlight the answer right there on the search results page, so you don't have to click through to determine if it's the right page.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: There's a related area called entity extraction, and there's been a lot of action there. For example there's a company called ClearForest, recently acquired by Reuters, which has put a lot of work into entity extraction. What's the story on that front?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: A lot of companies are working on this, we have our own in-house effort for name recognition and entity recognition, and this is of course really helpful as a kind of light semantic layer. But for us, it becomes deeper because we can start to relate all kinds of entities to one another, based on where we've seen them, and also with the help of things like Freebase. &lt;/p&gt;

&lt;p&gt;To follow up on how you'll see the impact in things like Live Search, beyond the improvement in relevance and in the quality of snippet, I think you'll see features like related searches, other ways of presenting information similar to the Factz that are shown in our Wikipedia product, I think you'll see a lot more work on the instant answers, with a database that extends beyond Freebase.&lt;/p&gt;

&lt;p&gt;Without committing to particular deliverables, these are the kinds of things I think you can expect to see. And you'll also continue to see growth on powerset.com, where we can be a bit more daring in terms of ways of presenting search results.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Well thanks for your time. This has been interesting, and I'll be fascinated to see how things unfold over the next few years. I've got a feeling you'll have access to a pile of resources to work with...&lt;/p&gt;

&lt;p&gt;&lt;b&gt;SP&lt;/b&gt;: Yeah, we're really excited about it. As a startup, it's hard to build a full-scale web search engine. Having the resources available, and the really smart people at Live Search, is just a tremendous boost to us. &lt;/p&gt;&lt;img src="http://channel9.msdn.com/489764/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Scott-Prevost-explains-Powersets-hybrid-approach-to-semantic-search/</comments><link>http://channel9.msdn.com/posts/JonUdell/Scott-Prevost-explains-Powersets-hybrid-approach-to-semantic-search/</link><pubDate>Fri, 26 Sep 2008 09:12:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/prevost/prevost.wma</guid><evnet:views>1035</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489764/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;p&gt;
Scott Prevost is General Manager and Director of Product for Powerset, the company whose semantic search engine was recently acquired by Microsoft. In this interview he describes the history of Powerset's natural language engine, and explains how it works as part of a hybrid approach to indexing, retrieval, and ranking.
&lt;/p&gt;</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/prevost/prevost.mp3" expression="full" duration="930" fileSize="7638144" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/prevost/prevost.wma" expression="full" duration="930" fileSize="7727573" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/prevost/prevost.wma" length="7727573" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Scott-Prevost-explains-Powersets-hybrid-approach-to-semantic-search/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489764/Trackback.aspx</trackback:ping><category>podcasts</category><category>powerset</category><category>Search</category><category>semantic</category></item><item><title>Kristin Tolle on biomedical initiatives at Microsoft Research</title><description>&lt;p&gt;
&lt;a href="http://research.microsoft.com/~ktolle/"&gt;Kristin Tolle&lt;/a&gt; is the Senior Research Program Manager for Biomedical Computing for External Research in Microsoft Research. Projects run the gamut, she says, from "bench to bedside". In this interview she discusses two major biomedical initiatives: &lt;a href="http://research.microsoft.com/ur/us/fundingopps/RFPs/CellPhoneAsPlatformForHealthcare_RFP.aspx"&gt;Cell Phone as a Platform for Health Care&lt;/a&gt;, and &lt;a href="http://research.microsoft.com/ur/us/fundingopps/rfps/GWAS_RFP_Awards.aspx?0sr=a"&gt;Computational Challenges of Genome Wide Association Studies&lt;/a&gt;.
&lt;/p&gt;
&lt;br /&gt;&lt;br /&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;img src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/tolle/tolle.jpg"&gt;
&lt;div&gt;
&lt;b&gt;Kristin Tolle&lt;/b&gt;
&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Give us a sense of the kinds of biomedical projects you're working on internally, as well as those you're working on with external partners. I spoke with &lt;a href="http://perspectives.on10.net/blogs/jonudell/Making-sense-of-electronic-health-records/"&gt;George Hripcsak&lt;/a&gt;, one of the researchers awarded a grant under the Computational Challenges of Genome Wide Association Studies (GWAS) program, and I know there are others involved there and in other programs as well. I'm interested in what Microsoft brings to the table in terms of helping these folks out with their computational and data management challenges, and also what kinds of things Microsoft learns from these engagements.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: The different programs inside of External Research run the gamut from the devices and mobility space, for home health care and elder care, all the way to genome wide association studies. So, we fund projects all the way from bench to bedside. &lt;/p&gt;

&lt;p&gt;Because we're a software company, we'll focus on the IT parts, and there's a reason for that. These are often the parts that don't get funded elsewhere, or only get funded sparsely. Our purpose for going into medical funding was to fill those gaps.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: And why do you think those gaps exist?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: I think it's a misperception, by a lot of the funding agencies, that either something doesn't fall into their area, or that it's not as important as the actual research being done. &lt;/p&gt;

&lt;p&gt;The problem is -- and this is why we're funding this area -- you cannot do medical research without computing. You just can't.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Of course not.&lt;/p&gt;

&lt;p&gt;Areas that we've funded...well, the biggest RFP we ran this year was &lt;a href="http://research.microsoft.com/ur/us/fundingopps/RFPs/CellPhoneAsPlatformForHealthcare_RFP.aspx"&gt;Cell Phone as a Platform for Healthcare&lt;/a&gt;, and that was 1.4 million dollars toward trying to reach rural and underserved communities with retro technologies like cellphones and televisions, because those are ubiquitous.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Oh, absolutely. I've spoken to &lt;a href="http://blog.jonudell.net/2008/02/18/a-conversation-with-joel-selanikio-about-cellphones-and-sms-in-developing-countries/"&gt;Joel Selanikio&lt;/a&gt;, who was recently awarded a MacArthur Grant to use handheld devices for field data collection in the third world. It's a huge opportunity, though as you say it's the sort of retro technology that doesn't make people's eyes light up in Silicon Valley, they just don't see the opportunity the same way.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: It's true that they don't. But interestingly we've got a lot of researchers in-house, whether we're talking about that situation or about genomics, who have a keen interest in working in these areas. So for example, we gave &lt;a href="http://www.physorg.com/news98525702.html"&gt;Fone+&lt;/a&gt; devices to a couple of the people who were winners of that award. The Fone+, which was developed by Microsoft Research Asia, is a phone that sits in a cradle, it's got RGB out to a television set, and USB input ports for mouse, keyboard, etc. So basically it enables your phone to work like a PC. &lt;/p&gt;

&lt;p&gt;Now the beauty of this is, if you hook that up to a microscopy device that can do instant visualization of blood cells, determine whether or not somebody has malaria, and display that on a television screen, you've now just set up a lab for doing microscopy anywhere in the world there's a TV and a cellphone. &lt;/p&gt;

&lt;p&gt;Another example is something we did with Washington University in St. Louis. They're developing low-cost ultrasound probes. Same thing. They're USB out, and designed to work with laptops, but now with the Fone+ you can plug it into this little cradle and now you've got an ultrasound anywhere in the world where there's power, a TV set, and a cellphone. You can even control the ultrasound device from the phone itself, it's just an amazing technology.&lt;/p&gt;

&lt;p&gt;So that's an example where Microsoft Research has developed a technology that facilitates providing health care to rural communities. Although it wasn't initially designed for that, it was initially designed for education. But I took it and sort of twisted it..&lt;/p&gt;

&lt;p&gt;[laughs]&lt;/p&gt;

&lt;p&gt;...and said, hey, that'd be really good for the cellphone as a platform for healthcare project. I got them to give me a bunch of phones and cradles, and started sending them out to the researchers who had won awards for the RFP I ran this year. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: What kinds of things have you heard back?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: We've only funded them six months ago, so we won't see results probably until sometime next year. But I've actually seen a demo with the Fone+ and the ultrasound unit already working, so that was impressive. Washington U. is ahead of the game, I'd say. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Are the folks you funded to do these things expected to bring technical chops to the table, in order to extend these devices? Are you working with them to provide support?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: Yes, we expect them to bring something to the table. And the ones who win the awards have superior technology. We had 145 people submit to the cellphone as a platform for healthcare. We'd originally planned to fund a million dollars, so that's about 10 projects, but we had to extend it to 1.4 million because we wanted to get to 10% acceptance rate. But even that, for us, is generally fairly low. Usually our acceptance rates are much higher. But we were just bombarded by people trying to come up with solutions for this space.&lt;/p&gt;

&lt;p&gt;It was disappointing to only be able to do 14 proposals because when I looked back through them, I'd say 85 were fundable and on the bubble. Isn't that terrible? You wish you could do more.&lt;/p&gt;

&lt;p&gt;So, I know you've talked to George about the genome wide association study, but I'd like to head in that direction in terms of some other things we bring to the table. &lt;/p&gt;

&lt;p&gt;When I went looking inside MSR for collaborators, what I learned was that there's a plethora of them. It's kind of surprising we hadn't been funding this area before, and it's no surprise to me now that it's become a strong pillar of funding for our organization. In fact we've trimmed a lot of other programs and will be focusing a lot on the healthcare space this time around. &lt;/p&gt;

&lt;p&gt;When I went hunting for collaborators I had no trouble finding them, even though I was new to the team, and that was because people consider healthcare the killer application for what they're working on. &lt;/p&gt;

&lt;p&gt;But we also had a rich group -- you know, we have a couple of MD/PhDs working here, Eric Horvitz and David Heckerman -- and David does a lot of work in the development of vaccines for HIV and malaria. But he's branching out now into this GWAS area. So he's been looking at Lou Gehrig's disease...&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: We should define the term GWAS, for people who aren't familiar.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: Sure. Genome wide association studies look across the genome to find if there are particular genes implicated in disease. That's one side of it. Another side is looking across the genome to check for reactions to different pharmaceutical agents. &lt;/p&gt;

&lt;p&gt;In simplest terms, these studies are what will deliver on being able to provide personalized medicine for all of us in the future.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Exactly, because it's a scan of an individual's complete genome, looking for markers and correlations.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: Absolutely right, that's correct. And I believe it will really deliver on personalized medicine for the masses. &lt;/p&gt;

&lt;p&gt;And the thing is, it's happening already. We've got &lt;a href="https://www.23andme.com/"&gt;23andMe&lt;/a&gt; popping up, &lt;a href="http://www.navigenics.com/"&gt;Navigenics&lt;/a&gt;, people are going to start using their genomic information to make informed decisions about the type of healthcare they receive. They'll be taking that to their doctors and assuming they'll be able to work with it. &lt;/p&gt;

&lt;p&gt;So we need to push the IT component of this down so that doctors have access to the information and know how to utilize it. Right now, that's the clinical gap between the research that's taking place in this area and the doctors who are performing the services needed.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So in this case you funded about a half dozen individuals to look into different aspects of this GWAS research...&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: ... yeah, very different...&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Right. So what do you hope will result from it?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: This was a new area for us, for Microsoft Research. We'd been dabbling in genomics for a while, but here we wanted to cast a wide net, find out what was going on out there, and find out if there were potential collaborations we could take from there. &lt;/p&gt;

&lt;p&gt;When you find people whose work you can help facilitate, you form strategic collaborations with them to take that research to the next level. &lt;/p&gt;

&lt;p&gt;Of course we bring a lot of resources to bear on this space. For example, the &lt;a href="http://www.codeplex.com/MSCompBio"&gt;Microsoft Computational Biology Tools&lt;/a&gt; that we've published out on CodePlex, open source. &lt;/p&gt;

&lt;p&gt;The other thing we bring to bear is a deep knowledge of machine learning and knowledge representation. And a number of researchers who've been working in general fields, but are now turning their attention to genomics. &lt;/p&gt;

&lt;p&gt;A couple of new examples: &lt;a href="http://johnwinn.org/"&gt;John Winn&lt;/a&gt;, and also &lt;a href="http://research.microsoft.com/~cmbishop/"&gt;Christopher Bishop&lt;/a&gt; who literally wrote the book on machine learning and pattern recognition. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: This is a pattern I'm seeing often in these external partnerships. In all areas of science, as you say, scientists are necessarily becoming computational in the work they do, it's just the nature of the beast. But they don't necessarily have deep domain expertise in either algorithms or data manipulation and analysis. There are lots of folks at MSR who are deep in those areas, and who can effectively partner with these folks to move things forward. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: And it's not just that we have these underlying analysis and infrastructure technologies, we also have the human-computer interaction technologies to make that stuff usable for clinicians, or even the public themselves. So we've got people doing interesting work in how do you make something more understandable? How do you do machine translation across sex, age, status, education?&lt;/p&gt;

&lt;p&gt;It's the same type of machine learning problem that you have with regard to going across language. You have to translate between languages, but you even have to translate within a language between different cultures, different demographics.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: What does Microsoft learn as a result of these collaborations, and what is Microsoft able to do with that?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: Well, the overall goal for External Research is to facilitate time to discovery, and to do so in a way that extends the arm of Microsoft Research. What we learn are which directions to move in. You know, we have publishing and tenure track promotion in Microsoft Research just as in academia. So if we can make our researchers more effective in reaching their goals to publish papers in Science and Nature, that's a fabulous thing. We've facilitated them and extended their reach. &lt;/p&gt;

&lt;p&gt;There's also corporate responsibility here as well -- Microsoft, as a company, investing in areas that are important for the future. It's also important for us to keep abreast of the times, and the things taking place now. And finally, we learn things that we may incorporate into our products through tech transfer. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: It may be early days to talk about tech transfer from your biomedical projects, but I'd imagine one obvious outcome will be related to the kinds of devices that will be part of the HealthVault program, as sensors start to exist in people's homes, monitoring their vital signs, and transmitting them to the cloud.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: Absolutely, there's no doubt about it. The more that we invest in applications, and in sensors that can feed HealthVault, the richer their offering becomes. &lt;/p&gt;

&lt;p&gt;The other thing is that we feel we're helping the public become more knowledgeable about their own healthcare. I think that's a common goal we share with the Health Solutions Group, &lt;a href="http://channel9.msdn.com/shows/Microsoft+Conversations+with+J/A-conversation-with-Peter-Neupert-about-HealthVault/"&gt;Peter Neupert's&lt;/a&gt; organization.&lt;/p&gt;

&lt;p&gt;They have other goals as well. So for instance, they have &lt;a href="http://www.microsoft.com/amalga/"&gt;Amalga&lt;/a&gt; on the clinical side&lt;/a&gt;, and also a project targeted at researchers, trying to take people through the literature search for drug discovery. We're working in conjunction with that. One of the projects we funded under GWAS was a system to predict possible adverse drug reactions based on genome wide association studies. &lt;/p&gt;

&lt;p&gt;Then the Columbia project -- George Hripcsak, whom you spoke with -- he's creating tools for researchers to integrate clinical information into the genetic analysis. Well, George's project is being built on top of Amalga. So there's a lot of synergy with the Health Solutions Group. And that's not unplanned. When I was starting out I met with Peter Neupert, back when he had eight people in his organization, and I interviewed him to find out what areas we should be investing in for healthcare. I'd also visited various schools and talked with people in their biomedical programs to find out what they were investing in as well. Then I tried to identify areas that would be relevant both to Microsoft Research and the Health Solutions Group. So, it's not just serendipity.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So your own background is in bioinformatics?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: Biomedical computing. I had to form a multidisciplinary PhD committee because my school didn't have a program for this, though they do now, at the University of Arizona. So I had to form a multidisciplinary committee to get a PhD focused on machine learning for healthcare, with computational linguistics thrown in. It was tough, but it was worth it. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Natural language processing was part of your focus as a student?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: Absolutely. Although the systems I developed were much more broadly utilized by different organizations. In fact, homeland security has some of the code I developed, which they use to scan for terrorist activities. It was initially developed for the National Library of Medicine to scan through unstructured text and identify keywords for indexers, and also to create small indices so that you could search faster and more accurately for publications in PubMed and CancerLit and other digital libraries. But you could see there were other implications. In fact it's also been used by the Department of Justice to make correlations among police reports.&lt;/p&gt;

&lt;p&gt;So it's a generic technology, but my piece of it was targeted toward healthcare, and that's where my background and interests have always been. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Are there other areas you'd like to discuss?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: I think we've covered the two major ones. I see us really investing long term in the area of devices, sensors, body sensor networks, ubiquitous and pervasive computing. That'll be a fundamental theme going forward, because it's been one of the more successful areas that we've made investments in. &lt;/p&gt;

&lt;p&gt;But I also see us keeping a strong eye on the "omics" -- proteomics, genomics, metabolomics, you name it. &lt;/p&gt;

&lt;p&gt;A third important area, and I don't know if it will be short term or long term, which is to address the other thing we talked about, and I don't have an RFP in this, but machine translation for people to be able to understand health care documents. &lt;/p&gt;

&lt;p&gt;The average person cannot go out on Medline and read the literature on their disorder. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: It is amazing, though, how much context people can assemble for themselves under pressure of intense need.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: No doubt about it. But it would be better if we could create facilitating interfaces that would enable people to more readily understand and interpret that information. There's a lot of it out there, it's information overload really, and if we could make it a little easier for them, that would be very valuable. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: I wonder how much of that will be done by machine translation, and how much by crowdsourcing various experts at various levels. I think probably both will happen.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: Yeah.&lt;/p&gt;

&lt;p&gt;You know, another important area -- and I was a bit disappointed when we ran our GWAS RFP that we didn't get anything concrete in this area -- was data visualization for genome wide association studies. &lt;/p&gt;

&lt;p&gt;I think that's because it's such a hard problem. These studies are computationally challenging as it is, there's a lot of data that gets generated. Then to visualize it, now you're adding another level of computational complexity such that it's already not realtime just looking at the data, then how do you take it to that next level of visualization? That's going to be an important emerging area going forward. &lt;/p&gt;

&lt;p&gt;So for instance, we've been talking with the folks at Oxford about getting a Surface there for collaborative visualization of cancer pathology. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Not just in this area, but in general, we are so underserved by our ability to make sense of large complex data.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: Yeah, and we have these cool technologies. I think the &lt;a href="http://blog.jonudell.net/2008/06/23/the-story-of-the-worldwide-telescope/"&gt;WorldWide&lt;/a&gt; &lt;a href="http://blog.jonudell.net/2008/07/14/how-the-worldwide-telescope-works/"&gt;Telescope&lt;/a&gt; could be redeployed in many environments, and I think healthcare is one of those killer applications. We were talking with the National Cancer Institute, and one of the things they'd like to do is take a slice out of the liver while the patient is still on the table and be able to zoom in and zoom out -- it's the same technology.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: It's a similar kind of thing. To the extent that we can, in different fields, standardize on data formats and define multidimensional data spaces, we can indeed have browsers and viewers for those spaces. What the Telescope does in its domain is create a browser for a web of astronomy data. So yes, we need to have browsers for webs of genome data, and all kinds of scientific data.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: We had a recent paper by &lt;a href="http://www.cs.umd.edu/~bongshin/"&gt;Bongshin Lee&lt;/a&gt;, she's done a distance encoding tree -- she calls it Detective -- and it's a scalable visualization tool for mapping multiple traits onto evolutionary trees. So we're trying to tackle it inside Microsoft Research, but I was hoping to see more people outside MSR show interest so we could start forming interesting collaborations in that area.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Well, this has been a lot of fun. I hope to follow up on some of those Fone+ applications, that sounds really inspiring.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: Yeah, that's the reason I've gone into this area. It is inspiring. There's not only corporate responsibility, there's personal responsibility for me as well, and that's why I like working in this particular space. It's genuinely gratifying to be able to make a difference in an area that, no question about it, is beneficial to society. &lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Well you've landed in the perfect spot to do that, and it sounds like you're having a blast.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;KT&lt;/b&gt;: Yes, I am. Well, thanks very much.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Thanks Kristin.&lt;/p&gt;&lt;img src="http://channel9.msdn.com/489763/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Kristin-Tolle-on-biomedical-initiatives-at-Microsoft-Research/</comments><link>http://channel9.msdn.com/posts/JonUdell/Kristin-Tolle-on-biomedical-initiatives-at-Microsoft-Research/</link><pubDate>Thu, 18 Sep 2008 15:37:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/tolle/tolle.wma</guid><evnet:views>883</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489763/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;p&gt;
&lt;a href="http://research.microsoft.com/~ktolle/"&gt;Kristin Tolle&lt;/a&gt; is the Senior Research Program Manager for Biomedical Computing for External Research in Microsoft Research. Projects run the gamut, she says, from "bench to bedside". In this interview she discusses two major biomedical initiatives: &lt;a href="http://research.microsoft.com/ur/us/fundingopps/RFPs/CellPhoneAsPlatformForHealthcare_RFP.aspx"&gt;Cell Phone as a Platform for Health Care&lt;/a&gt;, and &lt;a href="http://research.microsoft.com/ur/us/fundingopps/rfps/GWAS_RFP_Awards.aspx?0sr=a"&gt;Computational Challenges of Genome Wide Association Studies&lt;/a&gt;.
&lt;/p&gt;</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/tolle/tolle.mp3" expression="full" duration="1710" fileSize="13724736" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/tolle/tolle.wma" expression="full" duration="1710" fileSize="13892535" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/tolle/tolle.wma" length="13892535" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Kristin-Tolle-on-biomedical-initiatives-at-Microsoft-Research/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489763/Trackback.aspx</trackback:ping><category>biomedical</category><category>Microsoft Research</category></item><item><title>Roger Barga on Trident, a workbench for scientific workflow</title><description>&lt;p&gt;
Roger Barga, a principal architect with Microsoft's Technical Computing Initiative, is leading the development of Trident, a "workflow workbench" for science. In its first incarnation, the tool will enable oceanographers to automate the management and analysis of vast quantities of data produced by the &lt;a href="http://en.wikipedia.org/wiki/NEPTUNE"&gt;Neptune sensor array&lt;/a&gt;. But as Roger explains in this interview, it's not just about oceanography. Every science is becoming data-intensive. Trident's graphical workflow authoring, reusable data transforms, and support for provenance -- the ability to reliably track and reproduce all the analytic steps leading to a scientific result -- is being used by astronomers too, and is expected to find its way into many other disciplines as well.
&lt;/p&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;
            &lt;img src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/barga/barga.jpg" /&gt;
            &lt;div&gt;
            &lt;strong&gt;Roger Barga&lt;/strong&gt;
            &lt;/div&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; We're here to talk about the &lt;a href="http://www.microsoft.com/mscorp/tc/trident.mspx"&gt;Trident&lt;/a&gt;, the scientific workflow workbench for oceanography. Give us the 50,000-foot overview, then we'll zoom in.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Scientists are increasingly dealing with large volumes of data coming from disparate sources. The process used to be manageable. You'd get post-docs to convert the raw data from the instruments into readable formats, there was a manual workflow to process the data into useful data products. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Those were the good old days. Or maybe not so good.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Right. Because the time to get from raw data to those useful products was often measured in weeks or months. But now our ability to capture data has outpaced our ability to process and visualize it. And its rising exponentially with the rapid deployment of cheap sensors.&lt;/p&gt;
&lt;p&gt;The oceanographic project we're working on, Neptune, is just one example of this. Astronomy, and all other sciences, are experiencing the same trend.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Neptune is a University of Washington oceanographic project ...&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; ... it's actually an NSF project. The proper name is &lt;a href="http://www.joiscience.org/ocean_observing/initiative"&gt;Ocean Observatories Initiative&lt;/a&gt;, and it's being funded for several hundred million dollars. The University of Washington is one of the partners. Monterey Bay Aquarium Research Institute and a number of coastal observatories as well are involved.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; So fiberoptic cables are being laid, and lots of oceanographic data will be pouring in.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Exactly. It's transformed oceanography from a data-poor discipline to a data-rich one. They're going to be able to monitor the oceans 24x7 over long periods of time. So the kinds of processes they can study were never within reach before. They could collect data when there was an episodic event, or when they could get funding. Now they'll be collecting permanently.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; What's the scope of the sensor network?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; They're laying the trench in Monterey to test and deploy the sensors. NSF is reviewing the larger program, and getting ready to fund the Neptune array which will be off the coast of Washington and Oregon. The Canadian version of the Neptune array is up and running and collecting data, but the software infrastructure is still being built as we speak.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; What quantities of data is the Canadian array producing?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Gigabytes per day. It can easily handle a couple of high-def video streams coming from the ocean floor.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Really?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Yes. And also in-situ devices that can sequence organisms. It really is like not only taking Internet and power out to the ocean, but also a USB bus that instruments can be plugged into.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; What are some of the experiments that become possible with this setup?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; For example, being able to understand sediment flows across the ocean floor, how temperature and salinity change, how fresh water flows in from rivers, what kind of life exists at those margins. And understanding that interesting narrow band where life thrives in the ocean. Too high up and the tides affect it, too low and there's not enough light. But really, there are a myriad of things like that. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; So an experiment, in this data-intensive new world, involves formulating a hypothesis, looking for patterns in previously-collected data, and then seeing whether data collected in the future supports the hypothesis. &lt;/p&gt;
&lt;p&gt;That means you not only need to run an analysis on data, but that you have to be able to repeat that analysis on an evolving body of data. Hence the need for the workflow automation that you're providing in the workbench.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Yes. Another aspect is the need to calibrate and tune the models. If they can do that based on long-term monitoring, it'll remove a lot of the uncertainty in our understanding of the oceans. Versus now, where the data are so sparse that it's hard to validate the model.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; I guess also that as your understanding of the data and the models evolves, you might want to rethink what data you're capturing and how you're interpreting it. So, what is it that you've built with Trident, and how does it help you do those things?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Jim Gray was the first person who had the vision of an oceanographer's workbench. His insight was that scientists really want to interact with visualizations of the ocean, but there was a huge gap between the raw data and those visualizations. &lt;/p&gt;
&lt;p&gt;Managing information and managing data is one of Microsoft's core strengths. In &lt;a href="http://research.microsoft.com/erp/"&gt;External Research&lt;/a&gt;, we look for partnership opportunities where can bring our technology, learn from applying it to data-intensive stress tests that involve even more data than our commercial products currently handle, and figure out how to use or extend our technology to provide a solution.&lt;/p&gt;
&lt;p&gt;Jim pointed out that workflow was one of the key missing ingredients. We looked at the in-house tools, and Windows Workflow was the engine of choice...&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; ...although it didn't exist at the time Jim floated this idea, right?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Well, yes, it was around in alpha and beta form internally. Jim knew I was doing some of my research using Windows Workflow. Of course he left the solution up to us, but he accurately identified workflow as being a way that the scientist could not only manage the data transformations that were needed, but also create a library of solutions that could be shared and reused.&lt;/p&gt;
&lt;p&gt;If you look at how Microsoft works as a company, we build platforms and then we expect ISVs to come in and bridge the gap between the platforms and the user communities. That's the role our group has played. We're looking at the requirements of the scientists, we're looking at the platform Microsoft provides, and we're building on that platform to provide a custom solution to the scientists that will not only accelerate their work, but change how they do science -- enable them to ask and answer questions they couldn't before.&lt;/p&gt;
&lt;p&gt;We partnered initially with the University of Washington and Monterey Bay Aquarium Research Institute, or MBARI. They're already gathering data from sensors, so they could describe the spectrum of data we'd have to ingest into our workflows. The University of Washington has a visualization tool called &lt;a href="http://www.cs.washington.edu/homes/keithg/oceans.html"&gt;COVE&lt;/a&gt;, which scientists are adopting as the preferred way to look at the ocean floor. You can think of it as Virtual Earth for the ocean. If there's bathymetry data, you can pull it in and se the ocean floor. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; What kinds of data transformations are needed to get from the sensor outputs to COVE's inputs?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; There are probably about two dozen kinds of data sources we need to be able to ingest, based on the instruments and the types of data they put out. Typically it's streaming data in &lt;a href="http://www.unidata.ucar.edu/software/netcdf/"&gt;NetCDF format&lt;/a&gt;, or some other common format. So the first step is to recognize what kind of data format an instrument or model is kicking out, and transform it into an internal structure that our tool can use.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; But the workflow engine is abstracted from the instrumentation data formats and from the visualization tools, right? It's a mechanism for reproducibly running transformations, and managing that pipeline.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Right. But let's start with how we interacted with the scientists. Jim Gray would ask scientists: "What are the top 20 questions you want to ask, and queries you want to run?" From that, he'd get an understanding of how they viewed the data, and what kind of processing was required.&lt;/p&gt;
&lt;p&gt;We took the same approach, and asked the scientists which top 20 workflows they perform and which top 20 visualizations they like to see. Then we went through them from top to bottom, talking about the transforms and data integration that were required. We wound up with a set of two dozen transformations that were common across all of these workflows. That became the library of activities -- reusable chunks of code -- that the scientists could call upon to author not only these 20 workflows, but the next 20.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Can you give a couple of examples?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Sure. Regridding. You have two data sets, one's from a model and the other's from a set of deployed sensors out in the ocean. They're on different grid coordinate systems and you need to be able to bring those two together. That may require some interpolation, you might need to drop or add data points, transform coordinates, join data sets.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; There might be a temporal variant of the spatial gridding as well, to align different time scales? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Right. Some instruments are getting things every second, some are getting them every 15 minutes. You can ask the user: "Do you want interpolation to take place? Do you want the system to  match up the points?" Based on these inputs, the correct workflow gets configured and they see the resulting visualization for the region of ocean they're interested in.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; It sounds like some of these primitives will wind up being fairly general, not just specific to oceanography.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Indeed they are. We're producing a version of Trident for oceanography, but many of these activities could be useful for other sciences as well. People in earth sciences, for example, are also using NetCDF and many of the same operations.&lt;/p&gt;
&lt;p&gt;We expect that by building a tool which is extensible, and agnostic in terms of the science it supports, you can imagine it being used, for example, to understand the interaction between oceans and warm air currents. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; What does the Trident user see and do?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; We realized that the authoring experience for scientific workflow is very different from, say, business workflow. In business, you'd have your accountant write your expense report workflow. They'd lock it down, they'd deploy it, everybody would use it from then on, and nobody would touch it until it came back for bug fixes or enhancements. &lt;/p&gt;
&lt;p&gt;What we found with scientists is that they want to borrow somebody's workflow that does what they want, or close to it, load that workflow, and then start authoring from that point on. &lt;/p&gt;
&lt;p&gt;So we implemented that in Trident. You can search for workflows by purpose, or by the inputs they process. You click on one, and load it into a visual browser because while the oceanographers understand the workflows, they don't want to see C# or Java, they want to see something visual -- boxes that represent the transformations they want to apply. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; We've mentioned the Windows Workflow Foundation. For folks who aren't familiar with that system, how would you characterize it? How is it like and unlike a script execution engine?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; What's unique about workflow, versus scripting, is that with workflow you tease apart the notion of a schedule, which is the sequence of actions you'd like to have performed. If you were to look inside of each of those steps, you'd see code similar to what you'd find in a script. But on top of the sequence of steps you have an orchestration engine. When you pass this workflow -- this sequence of steps -- over to the orchestration engine, it runs the code inside each of the boxes, but as each one completes, control passes to the orchestration engine. &lt;/p&gt;
&lt;p&gt;So we have an abstraction layer, we've opened up the opportunity for reuse, the steps or activities become building blocks. In addition, the orchestration engine can monitor the execution of the workflow, or change the way it executes -- for example, by running blocks in parallel on a multicore machine. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; What struck me about the Workflow Foundation was the way in which workflows can be very big or very small. As small as the sequence of interactions with a form on a web page, in which case the orchestration engine can be embedded entirely in the code that's behind that web page. &lt;/p&gt;
&lt;p&gt;Or it can be a very big thing. But in any case, since it's part of the .NET Framework, it can exist in a variety of places. It can run locally on a laptop, it can run on a server in the cloud. There's an interesting amount of flexibility in terms of how workflows can be deployed. An application could embed Trident, or Trident could be used as a service.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; That's right. That's the magic of it. Yes, it could be hosted in an environment that the scientist is already familiar with. Or for a big institution, you could post it up as a service. Anybody could access it from a browser. And that's part of our mantra here. If we provide this to the scientists, we have to make sure it works with the tools they're comfortable using. You should be able to point your Linux box running Firefox at this tool.&lt;/p&gt;
&lt;p&gt;But to your other point, we're experimenting here with workflows that are resource-seeking. You could launch one, perhaps even on your cellphone, and that scheduling engine's going to look for systems that have resources for that workflow, tap into them, and give the user on the cellphone the impression it's running locally. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; You've mentioned that the workflow style encourages a level of modularity that you might not otherwise get. It also provides a level of monitoring, control, and auditing. The reason that's important goes back to the idea of reproducibility. &lt;/p&gt;
&lt;p&gt;A friend of mine is an HPC expert, and one of his pet peeves is that when people look at HPC they tend to focus on how much raw horsepower can be thrown at a problem. His question is: "Who's worrying about reproducibility and correctness?" It's a really important question. &lt;/p&gt;
&lt;p&gt;In your environment, as I understand it, one of the things that you get is the ability to capture and replay and analyze what happened in a workflow, and the ability to faithfully reproduce a sequence of steps. You talked about enabling things that scientists couldn't do before. It's not only that they couldn't analyze large quantities of data, but also that they couldn't automate their own methods, and be able to reflect on them in an automated way.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Right. Even if we couldn't run a workflow faster, and even if we weren't processing a lot more data, one of our key features is support for provenance. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Explain what you mean by provenance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Think about it in terms of art. For a given piece of art, we're able to establish through authorities that it's original, where it came from, and who's had their hands on it through its lifetime. Provenance for a workflow result is the same thing. Minimally we want to be able to establish trust in a result. If you think about how that happens, it often starts by considering who wrote the workflow. So with Trident you can click on a result and interrogate the history of the workflow: who wrote it, who reviewed it, who revised it, when it first entered the system.&lt;/p&gt;
&lt;p&gt;We do versioning as well, so you can look at an old result and know that it was created by an old version of the workflow. And then have the ability to run the new version on the old dataset to see if it makes a difference. &lt;/p&gt;
&lt;p&gt;We capture execution provenance so you know exactly how your result was created. We capture provenance on the workflows themselves so you know who created them, and who's touched them. &lt;/p&gt;
&lt;p&gt;You might be thinking about creating a community, where you click on a workflow and can say: "OK, I trust that post-doc."&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; I've been reflecting on what Microsoft brings to the world of science, in yours and in other collaborations that I've been talking to MSR folks about. One is clearly the special competence and expertise in data management and processing. Even for computationally-oriented scientists, that data expertise isn't necessarily a core competence. &lt;/p&gt;
&lt;p&gt;Another is the software tradition of version control. Again, that hasn't been a traditional strength of scientists. So this looks like a fruitful partnership on both fronts. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Agreed. It would be nice to get &lt;a href="http://perspectives.on10.net/blogs/jonudell/Making-sense-of-C02-data/"&gt;Catharine van Ingen&lt;/a&gt;, or perhaps Alex Szalay to chime in how how this is being used for astronomy. Because we're giving drops of this code to our e-science researchers for use in other areas. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; I'd love talk with Alex. I had a couple of in-depth conversations about the WorldWide Telescope, one with &lt;a href="http://blog.jonudell.net/2008/06/23/the-story-of-the-worldwide-telescope/"&gt;Curtis Wong&lt;/a&gt; and the other with &lt;a href="http://blog.jonudell.net/2008/07/14/how-the-worldwide-telescope-works/"&gt;Jonathan Fay&lt;/a&gt;, and we touched on the work Alex has done. He's using your stuff as well?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; Not him personally, but his project -- &lt;a href="http://pan-starrs.ifa.hawaii.edu/public/"&gt;Pan-STARRS&lt;/a&gt; -- is. Catharine van Ingen and Yogesh Simmhan are co-architects of that system along with Alex. And they're bringing workflow to the table. It's becoming the way scientists upload their data into Pan-STARRS and get it back out, and Trident is the workflow engine for that.&lt;/p&gt;
&lt;p&gt;You've probably also heard about other activities here in External Research. Perhaps the scholarly communiations aspect?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Yep. I've talked to &lt;a href="http://perspectives.on10.net/blogs/jonudell/Word-for-scientific-publishing/"&gt;Pablo Fernicola&lt;/a&gt; about the Word add-in for authoring scientific papers in the National Library of Medicine XML format. And recently I got the &lt;a href="http://blog.jonudell.net/2008/07/31/a-conversation-with-tony-hey-about-microsoft-external-research-and-the-new-breed-of-e-scientists/"&gt;overview of External Research&lt;/a&gt; from Tony Hey.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RB:&lt;/strong&gt; When you think about Trident in the context of scholarly communication -- and to your point about the importance of provenance, we see eye to eye on that -- not only can we use these tools for e-science data management, but we're focusing on reproducible research. When Trident has finished running a workflow, we'll create an XML structure that describes how to call back into Trident to recreate the result. We're really keen on the idea that not only is it easier to do the science, and publish the science, but actually reproduce it. And that XML description should be able to be embedded in the published work.&lt;/p&gt;
&lt;p&gt;That's really exciting. It's been talked about in the computational sciences, but never addressed end to end with a tool that's instrumented, that produces an XML standard the community can own which describes how the science was done, and that gets carried along with the publication, either physically or by reference, and we store this execution script in a database somewhere. &lt;/p&gt;
&lt;strong&gt;JU:&lt;/strong&gt; It's a really big idea.
&lt;p&gt;&lt;/p&gt;
&lt;strong&gt;RB:&lt;/strong&gt; It is, I think it could be transformational.
&lt;p&gt;&lt;/p&gt;
&lt;strong&gt;JU:&lt;/strong&gt; I do too.
&lt;p&gt;&lt;/p&gt;
&lt;strong&gt;RB:&lt;/strong&gt; Right now, reproducibility means that that you happen to know the person who did the experiment, or you happen to capture enough stuff in your lab notebook or on your whiteboard, then you have a chance of being able to do it again. But imagine being able to click any result, and automatically and transparently reproduce that result.
&lt;p&gt;&lt;/p&gt;
&lt;strong&gt;JU:&lt;/strong&gt; In reality it won't necessarily be the case that you can punch a button and have everything replayed exactly. But having the documentation, at that level of detail, and in that form, would be an incredible asset.
&lt;p&gt;&lt;/p&gt;
&lt;strong&gt;RB:&lt;/strong&gt; Agreed. The hope is that here in External Research, because we're building these tools not just in the context of one science project, but many, you can have community tools that bridge communities. We're talking to people in the earth sciences doing atmospheric studies, and their workflows and analyses are so similar to what the oceanographers are doing. But right now, since those two communities aren't talking or sharing tools, it's very difficult for one community to interact with the other.
&lt;p&gt;&lt;/p&gt;
&lt;strong&gt;JU:&lt;/strong&gt; That's a really nice point. Well, thanks Roger!
&lt;p&gt;&lt;/p&gt;
&lt;strong&gt;RB:&lt;/strong&gt; See you later.&lt;img src="http://channel9.msdn.com/489762/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Roger-Barga-on-Trident-a-workbench-for-scientific-workflow/</comments><link>http://channel9.msdn.com/posts/JonUdell/Roger-Barga-on-Trident-a-workbench-for-scientific-workflow/</link><pubDate>Thu, 28 Aug 2008 17:41:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/barga/barga.wma</guid><evnet:views>1289</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489762/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;p&gt;
Roger Barga, a principal architect with Microsoft's Technical Computing Initiative, is leading the development of Trident, a "workflow workbench" for science. In its first incarnation, the tool will enable oceanographers to automate the management and analysis of vast quantities of data produced by the &lt;a href="http://en.wikipedia.org/wiki/NEPTUNE"&gt;Neptune sensor array&lt;/a&gt;. But as Roger explains in this interview, it's not just about oceanography. Every science is becoming data-intensive. Trident's graphical workflow authoring, reusable data transforms, and support for provenance -- the ability to reliably track and reproduce all the analytic steps leading to a scientific result -- is being used by astronomers too, and is expected to find its way into many other disciplines as well.
&lt;/p&gt;</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/barga/barga.mp3" expression="full" duration="1890" fileSize="15136512" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/barga/barga.wma" expression="full" duration="1890" fileSize="15312203" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/barga/barga.wma" length="15312203" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Roger-Barga-on-Trident-a-workbench-for-scientific-workflow/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489762/Trackback.aspx</trackback:ping><category>e-science</category><category>oceanography</category><category>podcasts</category><category>Workflow</category></item><item><title>Lewis Shepherd discusses the Institute for Advanced Technology in Governments</title><description>&lt;p&gt;
Before joining Microsoft's Institute for Advanced Technology in Governments, &lt;a href="http://shepherdspi.com"&gt;Lewis Shepherd&lt;/a&gt; spent four years at the Defense Intelligence Agency where he helped usher in a &lt;a href="http://itc.conversationsnetwork.org/shows/detail1891.html"&gt;new era of collaboration&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt;
In this interview, he discusses how the Institute's small team of seven is exploring the nooks and crannies of Microsoft's research efforts and technology portfolios, looking for ways to help governments meet the diverse set of enterprise challenges they face.
&lt;/p&gt;
&lt;br /&gt;&lt;br /&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;img src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/shepherd/shepherd.jpg"&gt;
&lt;div&gt;
&lt;b&gt;Lewis Shepherd&lt;/b&gt;
&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: Microsoft's Institute for Advanced Technology in Government is a mysterious new organization that hasn't been heard from much. Readers of magazines like Government Computer News may have seen some notices about it, and may have noted that former CIA Assistant Director Jim Simon is the founder, and that it's attracted some other folks who formerly worked in government roles --  Aris Pappas from CIA, you from the Defense Intelligence Agency. 
&lt;/p&gt;
&lt;p&gt;
But not much else is known. So, what's this all about?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: Well, I'd say a better word than mysterious would be quiet. And that's because we're new and small. The Institute was set up by Bill Gates and Craig Mundie in 2004. They decided that Microsoft should play a more strategic role in the eyes of government. 
&lt;/p&gt;
&lt;p&gt;
Actually, in our title, there's a final letter, S. It's the Institute for Advanced Technology in Governments, plural. We're not strictly focusing on the U.S. federal government, which the backgrounds of the people involved would imply. It's actually governments at all levels. 
&lt;/p&gt;
&lt;p&gt;
We've worked a bit with state and local governments recently. In the past year we've increased our headcount to seven, and the seventh was an interesting addition. Bob Hayes is a British citizen, he lives and works in Cambridge UK, and he has experience at all levels of UK government. He began as a beat cop -- a bobby -- and has worked in and around the national security community in the UK for his entire career.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: You folks have close ties to Microsoft Research, but don't consider yourselves to be formally a research unit. Or do you?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: Not formally, but we do work closely with MSR, along with product groups. Jim Simon reports directly to Craig Mundie, so we have visibility into the entirety of strategic and future-oriented work that Microsoft is doing. Not just strictly MSR, but also incubation, Live Labs, Office Labs, forward-thinking people in various product groups.
&lt;/p&gt;
&lt;p&gt;
A lot of it is personal. We're just seven people, I joined just seven months ago. It's been a wonderful way to see inside this tiny little 90,000-employee company.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: Governments are specialized kinds of large enterprises, so there are all sorts of potential applications for Microsoft's enterprise-oriented technologies. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: That's a big part of it. The core mission is to assist our federal government, state governments, and eventually we hope local governments and NGOs, to focus on their enterprise-wide problems. Of which there are many. As bureaucratic organizations they're a lot like commercial organizations, but they have particular unique challenges that you probably don't really understand unless you've had the pleasure and frustration of working inside a large federal government organization. If you have, as we have, you really understand the pain, particularly within our national security community. The intelligence community, the Department of Defense, these are massive bureaucratic constellations of organizations. 
&lt;/p&gt;
&lt;p&gt;
Five out of the seven in our group have some background in that national security community. I didn't have a career in it. But coming from a different kind of public sector background, and then a Silicon Valley background, I spent four years at the Defense Intelligence Agency where, post-9/11, I tried to bring some new thinking to the intelligence community. Along with a lot of other people, we were able to do some of that. 
&lt;/p&gt;
&lt;p&gt;
Along the way, as I looked out at the different strategic partners that government has in the technology world, we certainly viewed Microsoft as important, just because we -- like most others -- were on a Windows and Office platform, and were using a lot of other Microsoft products, but to be honest, it was limited to that. We thought Microsoft was a product vendor.
&lt;/p&gt;
&lt;p&gt;
One thing that began to change my mind was, as a government executive, I used to visit Microsoft annually in Redmond. The account team that supported our agency began to hear from me that we'd noticed Microsoft spending six and seven billion dollars a year in R&amp;D. I started to wonder: Where's that money going? And how much of it was focused on assisting with government problems? The answer to that was, at least consciously, in the minds of MSR, none of it. 
&lt;/p&gt;
&lt;p&gt;
Yet here I was in the intelligence community, working with the DoD fighting this long war on terror, surrounded by some of the keenest early adopters in the world who were looking to push the the limits of technology. So I began to talk to Microsoft and found they had indeed set up this quiet group in 2004 to consult with government, both inside the intelligence community and elsewhere, on these kinds of enterprise problems, and to bring to bear some of the more interesting and promising fruits of Microsoft Research.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: There are a bunch of Microsoft technology initiatives that intersect with the interests of governments as large IT-supported enterprises: identity management, data management, systems management, service-oriented architecture, application development. That's all playing out in governments as in other enterprises, but I suspect that's not what you mean by advanced technologies in governments?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: Correct. Although many of those are very much of interest, and frankly, federal and state governments aren't always aware of the leading edge in commercial software technology. So one of our roles is to make them aware of the leading edge, and of best practices. We do that in a way that doesn't come across like a sales pitch, because they don't need to hear that. And if it makes sense to advise government leaders to innovate in ways that don't necessarily require Microsoft products, that's a plus for the Institute and for Microsoft's role in assisting governments. So we've done that several times.
&lt;/p&gt;
&lt;p&gt;
Our sales guys understandably focus on what they can sell today, and in the next quarter. But often government needs to know how to make better use of what it already has, or how to use something that isn't a Microsoft product.
&lt;/p&gt;
&lt;p&gt;
But here's a case study that's more along the lines of what we mainly focus on: Microsoft Surface. It's gotten a lot of buzz as you know, and is now being commercially rolled out in the entertainment space.
&lt;/p&gt;
&lt;p&gt;
When Surface was still a research project, Jim Simon -- who loves to poke into the nooks and crannies of MSR and incubation projects -- saw it, and talked with the team, and realized they were mainly focused on it as a gaming platform. 
&lt;/p&gt;
&lt;p&gt;
He thought about that for a while, and said there were two additional markets, and he knew people in each.
&lt;/p&gt;
&lt;p&gt;
One is the big-G gaming world of casinos. In venues like Vegas and Atlantic City, the entertainment experience involves a holistic view of customers, from the moment they show up at the hotel, day and night on the casino floor, at the shows, at restaurants. The touch-enabled UI really supports that scenario.
&lt;/p&gt;
&lt;p&gt;
The other, of course, is the national security world, particularly DoD. It so happened than when I came on board and first learned about Surface, I had previously, at the DIA, had experience with touch tables, a different kind of technology -- a touchscreen on a pool-table-sized device -- that we were one of the first customers for. It was sold to us by a large defense contractor, and it did a great job for us in 2004 and 2005. But each device was $250,000. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: And what were you able to do with it?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: Defense planners typically stand around a big sheet of paper, or a map, trying to collaboratively plan out a day's or month's or year's campaign. Doing it that way, or on a sandbox, is the traditional way, and there hadn't really been any innovation.
&lt;/p&gt;
&lt;p&gt;
With the touch table device, you could show a map on this horizontal surface, and data layers. Think about Virtual Earth or Google Earth, the ability to do that on a 6-foot by 9-foot table becomes very appealing.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: Was this commercial software on a custom device, or was it all custom?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: It was all custom. $250,000 a pop. We bought two of them, and put them on two different floors, for two different teams to use. After a while we realized it'd be nice if the teams could collaborate, but they weren't networked, so we had to pay our contractor another $100,000 to connect them.
&lt;/p&gt;
&lt;p&gt;
When I first saw Microsoft Surface, and realized the entire thing ran on essentially a state-of-the-art PC, and that the APIs were going to be open enough for developers to put any kind of Windows software onto it, and that it would all be networked...it really opened up the possibilities.
&lt;/p&gt;
&lt;p&gt;
And then when you realized that, because of the scale Microsoft operates on, the price would be $10,000 instead of $250,000, it just blew my mind.
&lt;/p&gt;
&lt;p&gt;
So now that same defense contractor has become one of the first to develop on the Surface platform. They know their customer, they know the scenarios for defense, intelligence, homeland security, state and local police. And I think the Institute played a small but important role in opening the eyes of a lot of people to the kind of difference a Microsoft platform could bring to that environment.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: What are some other kinds of connections like that that you're making, or want to make?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: Well, I've been extremely interested in robotics, and the emergine large-scale appeal of the Microsoft Robotics Studio. You and I have chatted about this. The appeal isn't so much robotics, per se, but rather the back-end architecture that takes advantage of advances in concurrency and high-performance computing and distributed services.
&lt;/p&gt;
&lt;p&gt;
You mentioned service-oriented architecture, and SOA has been a buzzword in government and other kinds of enterprise circles for a while now. Well there really are multiple services being developed and deployed in lots of different environments. The ability to orchestrate enormous numbers of those services is something you can do natively with the Robotics Studio, whether or not you intend to develop a robot.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: It's a fascinating outgrowth of that project, and the implications are only beginning to sink in. The software infrastructure is extraordinarily general-purpose.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: Right. Among overnment early adopters, the people we've seen take a good deal of interest have been in DARPA, and also in a new organization in the intelligence community called &lt;a href="http://www.iarpa.gov/"&gt;IARPA&lt;/a&gt;. Once you get the right people looking at this, they understand what's really behind it, and the power of it.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: You've also been known as a proponent of Web 2.0 methods, and were responsible for bringing Intellipedia to life.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: I was one of the people who did. It was a great team.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: What are the opportunities in that realm?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: I've watched with great interest the rise of Enterprise 2.0. I think most people credit Andrew McAfee with coining that term. He and I have spoken on a number of panels, and we've talked about how government organizations can nurture bottom-up development of advanced capabilities using things like blogs and wikis, and take advantage of the emerging power of social media, without the kinds of constraints you inevitably get in a government bureaucracy. 
&lt;/p&gt;
&lt;p&gt;
Not only do you have all the usual bureaucratic problems of large organizations, but there's also a hypersensitivity to security, and also -- within the civil service -- the disincentive to innovation that happens when people are career civil servants. 
&lt;/p&gt;
&lt;p&gt;
So how do you nurture grassroots adoption of these technologies? It's very personal, you have to find the right people. I was lucky to have an inside role in the intelligence community.
&lt;/p&gt;
&lt;p&gt;
When you say Intellipedia, people may or may not know about it, but it's been a phenomenal success story in the intelligence community as a community. That word, community, was openly mocked for decades because the sixteen different agencies -- and particularly the big ones everybody knows about, CIA, NSA, DIA, NRO, the alphabet soup of them all -- really didn't collaborate that well, if at all.
&lt;/p&gt;
&lt;p&gt;
The 9/11 and WMD commissions went into great detail about this. What I and others were able to do was to begin working in small ways on identifiable chunks of value that we could create for community-wide use on shared networks.
&lt;/p&gt;
&lt;p&gt;
This work began in 2004. I don't think there was any great flash of inspiration in deciding to basically plagiarize Wikipedia on a secure network, as a large-scale network for socially-authored and socially-maintained intelligence that had been kept in stovepiped databases.
&lt;/p&gt;
&lt;p&gt;
The first pilot was in 2004, and it opened as an enterprise system for the whole intelligence community in 2006. Intellipedia has been a big success. There was an initial period of hockey-stick growth. That's leveled off some now, and the challenge -- as in any enterprise -- will be to continue to evangelize the business practices of social networking, and the value they bring within a large diverse set of organizations.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: Of course this wasn't an example of advanced technology. Wikis and blogs are just the kudzu of the Internet. It was more an exercise in social engineering than a deployment of any new or advanced technology, and appropriately so.
&lt;/p&gt;
&lt;p&gt;
From a Microsoft perspective, then, is it about applying advanced technologies from MSR in this environment? Is it about bringing some of that grassroots sensibility into the Microsoft platform?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: I think it's both. In the federal space, particularly intelligence where you have knowledge workers on steroids, we have an interesting mindset within the account teams. They're not only focusing on what can be done with SharePoint 2007, with Office 2007's XML capabilities. They're also seeking out bits of code being worked on in Live Labs, in Office Labs, and elsewhere. Popfly, for example, It's being heartily evangelized by Microsoft teams within the federal government, and it's gaining enormous receptivity.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: I've talked about this with John Montgomery, the Popfly lead.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: In fact there's so much interest, he's almost to the point of being overwhelmed. 
&lt;/p&gt;
&lt;p&gt;
What I see is a changing mindset about Microsoft, and the role it can play in government. It's not just about are we on a Windows platform. It's about what can I use, on my computer or mobile device, that'll enable me to do things I couldn't do before. If those are Microsoft things with a Windows label, that's great. If they're not, if they're cool, funky, web-centric things like Popfly, that's great too.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: Tell me if this fits into your charter. A big aspect of what I think of as Government 2.0 is the emerging availability of various sources of government data. There's a growing consensus that data will be made available, and that's happening, but in a way that reminds me of how things were, and mostly still are, on the scientific web. Yeah, there's the data, go grab the gzipped tarball and have fun with it. 
&lt;/p&gt;
&lt;p&gt;
As opposed to offering a service layer interposed between both applications and human being.
&lt;/p&gt;
&lt;p&gt;
I see an interesting possible role for Microsoft, and I see it as extension of something that's happening in the relationship between MSR and the scientific community. I've recently been talking to a lot of people in &lt;a href="http://perspectives.on10.net/blogs/jonudell/How-Microsofts-External-Research-Division-works-with-a-new-breed-of-e-scientists/"&gt;Tony Hey's area&lt;/a&gt;. These folks are what I'd call informaticians, and they're working closely with scientists in various fields.
&lt;/p&gt;
&lt;p&gt;
In every branch of science, now, the work revolves around the collection and analysis of previously unimaginable quantities of data. One of the things I'm seeing Microsoft consistently doing in its partnerships with scientists is to provide both infrastructure and consulting expertise, to help people wrap their arms around large datasets and make them useful in ways they wouldn't otherwise be.
&lt;/p&gt;
&lt;p&gt;
I'm wondering if there isn't scope for something analogous in the government space, as these datasets begin to be made available, but not necessarily in ways that enable citizens to ask and answer meaningful questions, or relate the raw information to policy.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: You've hit on something that's really important, and yes, it's an interest of ours. It's very hard to do, but if you do it, the value is tremendous.
&lt;/p&gt;
&lt;p&gt;
I'll give some examples of things that we're thinking about, and one that we're working on.
&lt;/p&gt;
&lt;p&gt;
One thing we're thinking about, as a model, comes from one member our group I want to mention, George Spix, because he's such a great guy, a lot of people around Microsoft know George. He's the only guy in our group who's been with the company for a long time, before that he worked with Seymour Cray.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: Here's something about George you may not know. The Microsoft Conference Center recently hosted the annual &lt;a href="http://perspectives.on10.net/blogs/jonudell/Ted-Semon-reflects-on-the-2008-Space-Elevator-Conference/"&gt;space elevator conference&lt;/a&gt;, and George was the guy who gave the go-ahead for that.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: That's closely related to the example I was going to mention, which is the WorldWide Telescope. George also did some work on that, and if you think about it, it exemplifies what you were talking about.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: Absolutely. The &lt;a href="http://perspectives.on10.net/blogs/jonudell/The-story-of-the-WorldWide-Telescope/"&gt;WorldWide&lt;/a&gt; &lt;a href="http://perspectives.on10.net/blogs/jonudell/How-the-WorldWide-Telescope-works/"&gt;Telescope&lt;/a&gt; is the paradigmatic example of a service layer that's been interposed between a previously available but practically inacessible dataset and a set of interoperable applications, on the one hand, and ordinary people, on the other. You're right. It's the perfect prototype.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: It is. And as such, it also serves as the perfect educational device for people who are in a position of authority over other large stovepiped datasets. So we've been using WorldWide Telescope as a teaching element: "Here's the future of what your world could be." It's not only an extremely appealing app -- people fall in love with it, Robert Scoble was famously moved to tears by it -- but there's also, as you said, the paradigmatic simplicity, and obviousness, and utility, of opening up data. So we've been using that in a number of ways to stretch the mental boundaries that government officials have about their data, about the accessibility they currently offer, about what new technologies and web-scale computing could bring to their data, and about what that would do for them, and their intent to serve their customers, their users, their citizens.
&lt;/p&gt;
&lt;p&gt;
It really is a mind-blowing way to get them thinking creatively about what could be done.
&lt;/p&gt;
&lt;p&gt;
Another example: machine translation, and some of the hybrid translation approaches that Microsoft Research is pushing the boundaries on. Here we have real examples, already offered within the Windows Live constellation, that people don't really know about. 
&lt;/p&gt;
&lt;p&gt;
There's the translator bot that is a Live Messenger client, you can have simultaneous translation among a dozen languages in your instant messaging.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: I hadn't seen that myself.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: Well, there you go. And there's the &lt;a href="http://gallery.live.com/liveItemDetail.aspx?li=9ca66480-2d87-4341-87f6-86875d9a0908"&gt;live translator plugin&lt;/a&gt; for Internet Explorer, I just &lt;a href="http://lewisshepherd.wordpress.com/2008/08/08/using-web-20-to-track-a-political-crisis/"&gt;blogged about&lt;/a&gt; that last week. It enables you to surf foreign language websites, with simultaneous translation. It's really changed things for me. I have a lot of interest in Russia, so being able to surf Russian-language sites, with good-enough machine translation appearing right in the browser, it's phenomenal. 
&lt;/p&gt;
&lt;p&gt;
And as we show that as a service to be exploited within service architectures, that's something governments find really intriguing. It helps them think about how they could provide better access for diverse populations. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: I'm glad to hear that. I've been pushing for a while on this theme, and have recently concluded that we're kind of stuck on the question of access to the data. But that's only the first step. It's great that we're getting to the point where that first step will be taken, but there's so much more that can be shown and done.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: I'll give you another example. I've had that same lingering feeling of frustration about the web, and the billions of pages and documents I can theoretically access. But there's no sense-making. 
&lt;/p&gt;
&lt;p&gt;
One of the most exciting things I saw last year, even before joining Microsoft, was Photosynth and Seadragon. We're working on making these and related technologies, like Deep Zoom, available to government organizations that have access to very large archives of images which are just sitting there. Yes, theoretically you have access to them, but something like Photosynth enables you to make sense out of them.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: What kind of images are we talking about?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: Well, some I can talk about and some I can't. But we have been talking to some state governments about their access to the world of Flickr and online collections like that, from the standpoint of homeland security, and the ability of first responders to make sense of the visual environment of today's world, in an up-to-the-minute way, just based on the open source information that's available.
&lt;/p&gt;
&lt;p&gt;
This is something Microsoft has helped a lot of public sector groups, like the Los Angeles fire and police departments, to be real leaders on.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: So a lot of documentation of planet Earth is being done in a grassroots, ad-hoc way, for example in the form of photos on Flickr that are tagged and even geolocated. And there might be a government interest in those collections as the most up-to-date record of what exists. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: That's right, and it eventually works in a circular way on the provision of government services back to people. If you think about a government bureaucrat, like a building inspector, who goes to a site, takes a photo or two, and certifies that the site is being worked on in a way that conforms to local or county regulations.
&lt;/p&gt;
&lt;p&gt;
Well, the ability to do all that in realtime, with a camera-equipped cellphone, and do it in a secure way, with timestamping and geocoding...
&lt;/p&gt;
&lt;p&gt;
Or think about that capability deployed in child welfare scenarios where there certainly aren't enough government personnel to visit all the domiciles where trouble is reported. 
&lt;/p&gt;
&lt;p&gt;
When you think about large volumes of data being transmitted in both directions -- from citizens to governments, and from governments to citizens -- it really opens up the world. We haven't figured out all the ways, but it's fascinating to think about the diverse set of enterprise challenges that governments face, and about the technologies we have in the nooks and crannies of Microsoft that might be able to help.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: It sounds like you're having fun snooping around finding them.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;LS&lt;/b&gt;: I'm having a blast!
&lt;/p&gt;&lt;img src="http://channel9.msdn.com/489761/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Lewis-Shepherd-discusses-the-Institute-for-Advanced-Technology-in-Governments/</comments><link>http://channel9.msdn.com/posts/JonUdell/Lewis-Shepherd-discusses-the-Institute-for-Advanced-Technology-in-Governments/</link><pubDate>Thu, 14 Aug 2008 17:01:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/shepherd/shepherd.wma</guid><evnet:views>1055</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489761/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;p&gt;Before joining Microsoft's Institute for Advanced Technology in Governments, &lt;a href="http://shepherdspi.com"&gt;Lewis Shepherd&lt;/a&gt; spent four years at the Defense Intelligence Agency where he helped usher in a &lt;a href="http://itc.conversationsnetwork.org/shows/detail1891.html"&gt;new era of collaboration&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;In this interview, he discusses how the Institute's small team of seven is exploring the nooks and crannies of Microsoft's research efforts and technology portfolios, looking for ways to help governments meet the diverse set of enterprise challenges they face. &lt;/p&gt;</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/shepherd/shepherd.mp3" expression="full" duration="2778" fileSize="22231680" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/shepherd/shepherd.wma" expression="full" duration="2778" fileSize="22498939" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/shepherd/shepherd.wma" length="22498939" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Lewis-Shepherd-discusses-the-Institute-for-Advanced-Technology-in-Governments/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489761/Trackback.aspx</trackback:ping><category>government</category></item><item><title>Maurice Franklin reflects on the 2008 Space Elevator Conference</title><description>&lt;p&gt;
Maurice Franklin is a 12-year Microsoft veteran whose career has focused on performance engineering and server scalability. He's also passionate about the concept of a space elevator, and recently organized and hosted a conference held on that topic at the Microsoft Conference Center.
&lt;/p&gt;
&lt;p&gt;
In this interview he discusses reasons to build a space elevator, and describes how the concept, first proposed by Arthur C. Clarke, is evolving toward a practical implementation.
&lt;/p&gt;
&lt;p&gt;
The transcript for this interview appears below. Audio is available at &lt;a href="http://itc.conversationsnetwork.org/shows/detail3780.html"&gt;ITConversations&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt;
In a &lt;a href="http://perspectives.on10.net/blogs/jonudell/Ted-Semon-reflects-on-the-2008-Space-Elevator-Conference/"&gt;related interview&lt;/a&gt; Ted Semon, author of the &lt;a href="http://www.spaceelevatorblog.com/"&gt;Space Elevator Blog&lt;/a&gt;, reflects on the conference and on the goals and status of the effort.
&lt;/p&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;
            &lt;img src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/franklin/franklin.jpg" /&gt;
            &lt;div&gt;
            &lt;strong&gt;Maurice Franklin&lt;/strong&gt;
            &lt;/div&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: Maurice, most folks don't know even that there's a serious plan to build a space elevator, and I'm sure even fewer know that those most closely involved gathered this past week for a conference hosted at Microsoft. How did that happen?
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Well, I'd qualify the word "plan" ...
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: Maybe we should call it an intention.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: That's a good word. Or aspiration. So, I got involved in the space elevator world in 2002 when I discovered, quite by accident, that there was a conference in Seattle. It was hosted by an entrepeneur, Michael Lane, and a scientist, Bradley Edwards.
&lt;/p&gt;
&lt;p&gt;
Dr. Edwards is the father of the 21st-century concept of the space elevator. He'd heard it was impossible, and didn't believe that, so he got a NASA grant, and came up with something that made everybody say: "Well, that's not how we were thinking about it at all. That's a decades plan instead of a centuries plan."
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: What was different?
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: I'd read a 1997 NASA study, it was big science and big engineering. It relied upon a spacefaring technology to get to a space elevator. Those of us who are fans of the space elevator see it as a bootstrapping mechanism to get to that spacefaring technology.
&lt;/p&gt;
&lt;p&gt;
So, for example, one of the prerequisites assumed in the 1997 study was the ability to move an asteroid into earth orbit. And then to put a manned carbon-nanotube-manufacturing station in orbit. Well, you can't do any of that unless you have a space elevator.
&lt;/p&gt;
&lt;p&gt;
As much as anybody has, Dr. Edwards cracked the chicken-and-egg problem. His proposal requires on the order of four or five heavy launches. After that, it self-bootstraps -- if the materials come along, and a lot of other ifs, but that's a game-changing proposal.
&lt;/p&gt;
&lt;p&gt;
The NASA study also required a massive elevator, because there were going to be maglev trains running at high speeds, and that just adds more and more weight.
&lt;/p&gt;
&lt;p&gt;
His plan is much more modest, it only goes 200 kilometers per hour, it's a bit slow, but it's practical, you could reasonably get stuff up pretty quickly.
&lt;/p&gt;
&lt;p&gt;
And he added in remote power beaming, so you don't have to carry fuel for those days of climbing up towards geosynchronous orbit.
&lt;/p&gt;
&lt;p&gt;
In all respects, it's much more practical. If we could get past the technical hurdles, it's something in our lifetime, or at least those of us who are older hope so.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: So your role at Microsoft was then, in 2002, and is now...
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: I'm a performance engineer. No connection with space-related activites at all, just a personal interest. Although that 2002 conference was invitation-only, I showed up. I'd read the NASA study, I'd read Brad's study, I had intelligent questions, and I was accepted, which was very cool.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: So how did Microsoft come to be the sponsor and host of the 2008 conference?
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: One of the collaborators on this project is Dr. Bryan Laubscher. He's an astrophysicist, most recently with Los Alamos National Laboratory. About a year ago he was invited to the visiting speaker program at Microsoft Research. I went to hear him, gave him my card, and invited him to contact me if there was any way I could help.
&lt;/p&gt;
&lt;p&gt;
About a month later he got in touch and said, "I'd like to make Seattle and Microsoft the center of the space elevator universe, so let's get a conference."
&lt;/p&gt;
&lt;p&gt;
Well, Microsoft employees themselves can't usually sign up the Microsoft conference center, so it became my job to find a sponsor. Through various happenstances I got connected with George Spix, now with the Microsoft Institute for Advanced Technology in Governments, who I actually I knew from some work we'd done together in the past, and he said "Sure."
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: It's interesting how these things play out.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Yes, it is.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: So on the face of it, this is a project that has a lot to do with power engineering, and...
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: ...materials technology...
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: ...right, and civil engineering. But there are computational spinoffs and synergies too.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Right. One of the subjects that came up at the conference is that even though this ribbon, as we refer to it, will be under more stress and tension than anything ever built by man, it will also -- by virtue of being so darn long -- be very dynamic. It will flutter in the wind over miles rather than feet.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: So there will be a need to simulate the resonances.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Yes. It's going to seem to be a simple object, but because of its length and its multiple environments -- gravitational, atmospheric -- it'll be very complex.
&lt;/p&gt;
&lt;p&gt;
Also, it has to avoid what's already up there. If you put something on the equator and then span zero to 100,000 kilometers, you will intercept the orbit of everything, eventually. Repeat those words: Everything, eventually.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: Did you say 100,000 kilometers?
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Yes, it's very long.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: Wow.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Yeah. It's a significant proportion of the way to the moon.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: That's way more than I realized.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: To touch the earth and stay in the same spot, it has to more or less orbit at geosynchronous. Once you establish that, you have to put enough mass above geosynchronous, call it a counterweight, that the earth is attempting to eject from orbit, to offset the mass, or in this case weight, below geosynchronous, that the earth is trying to pull down.
&lt;/p&gt;
&lt;p&gt;
So you fix that geosynchronous point at 22,300 miles, I think it is. You then get to decide how big that counterweight is versus how far out it goes.
&lt;/p&gt;
&lt;p&gt;
If you put an asteroid of appropriate side, it can be just the other side of geosynchronous.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: Ah. So that's what I'm remembering from the original proposals.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Yes, in the NASA proposal, and also in Fountains of Paradise, the Arthur C. Clarke book, it was a large counterweight very close to geosynchronous.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: So that's "big engineering" as you've said, and the modern concept is to go smaller.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Right.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: In terms of computational challenges, we have this carbon nanotube ribbon which is 100,000 kilometers long, and it's on a collision course with every piece of space junk.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Right. So Dr. Edwards' proposal is make the base movable, by putting it out to sea on an oil-rig-like platform, and then to move it when the computers say it has to move. You induce movement of the base, and then up and up, to miss the space station.
&lt;/p&gt;
&lt;p&gt;
So I think I just described a very complex thing. You've got to move this thing hours or days ahead of a collision that you have to avoide by, say, 5 kilometers to be safe.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: That's the safety buffer?
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Right. Part of the Air Force monitors every object up there.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: Even screwdrivers?
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: I think they get down to the centimeter, but don't quote me, I'd have to look it up. Anyway they track all this, and call up the space shuttle and say, you need to fly a little bit that way.
&lt;/p&gt;
&lt;p&gt;
But now you have this object that you can't just move on a whim. And when you move one part of it, you move all of it. The space shuttle has a buffer of a kilometer, or 10 kilometers, I don't know, but say it's a cubic kilometer they have to keep clear.
&lt;/p&gt;
&lt;p&gt;
Well if the space elevator's square danger area is a kilometer, it's not a cube.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: It's a column.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: A column that's 100,000 cubic kilometers, projecting through every orbit known to man. And in order to move it, you have to move it all.
&lt;/p&gt;
&lt;p&gt;
There was a NASA retiree of considerable note at the conference, Ivan Bekey, and he came in and said, "Have you guys really figured this out yet? I don't think you have."
&lt;/p&gt;
&lt;p&gt;
It was a great keynote: "Potentially Fatal Elevator Flaws That Must Be Addressed".
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: So the strategy you just discussed has been on the table.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Yeah, but not enough to stand up to the scrutiny of somebody like Ivan Bekey, who's saying, no, no, you have to actually figure this out. And people are like, yeah, yeah, you're right, we have to figure it out.
&lt;/p&gt;
&lt;p&gt;
Then there's the question of what kind of communications network runs on the space elevator. There will be radio waves, but you'll want to bring up your browser too, and the latencies will be very different than for a transatlantic cable, so there are interesting computational challenges there.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: Fascinating. So as an observer of this scene for some years, what was notable about this year's conference?
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Well, for one thing, to see people working through the ideas that Dr. Edwards came up with, for example the collision management problem. On the other hand, there was a presentation on how, having shot the arrow, you get the rest of the material across. He proposed a deployment. But an engineer showed up, somebody working purely on his own time, and he has software to simulate the dynamics, and he said, no, that's probably not going to work, but this might. His idea was to keep the middle and the ends at geosynchronous, then spool out two in-between parts, then let go of the ends.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: So the most basic deployment strategy is still very much being discussed and debated.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Yeah. It's such a complex object. This isn't a satellite. It's a thing that sticks through a lot of gravity gradients, in its first 10 kilometers it's beat up by the atmosphere.
&lt;/p&gt;
&lt;p&gt;
So people are digging in, and coming up with different ideas that thematically fit in and move things forward.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: People have a general sense of what it would mean to get a several order of magnitude reduction in the cost of moving stuff into space, and of what applications could flow from that, and of the benefits of those applications. What's your take?
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Cost is a big deal. We ship things across the country because it costs 15 dollars, but we don't ship things to orbit because it costs 15 million. The space elevator changes the market dynamics completely. Because it acts more like a railroad  with high upfront investment and then low operating cost, versus rockets with continuously high operating costs, that drives potential market opportunity.
&lt;/p&gt;
&lt;p&gt;
But as one gentleman pointed out very forcefully, in addition to cost, there's capacity. You can only build and launch rockets so fast. The space elevator not only gets you low price, it's always there, waiting to launch payloads every day. The baseline elevator has, say, a 10-ton capacity every day, versus 15 tons every 3 or 4 months with a fleet of four space shuttles.
&lt;/p&gt;
&lt;p&gt;
He was pointing out that all these big post-Apollo dreams -- going to the moon and Mars, tugging asteroid into orbit to mine them...
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: ... large-scale orbital solar power collectors and beamers...
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: ... right, which is one of my favorite applications. So for these things it's not enough to have low cost, you have to lift a lot of stuff up there.
&lt;/p&gt;
&lt;p&gt;
Solar power satellites sound wonderful. Of course there's a whole set of other engineering and environmental problems. But in order to even make a dent, you have to move a lot into orbit. That guy was right. Capacity, capacity, capacity.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: What other uses are people talking about?
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Space tourism. People go up the elevator for a few hours, see the rim of the earth, that might be a gangbuster business.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: How long does it take to get up?
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: 200 kilometers/hour is kind of slow, I think geosynchronous works out to 6 days. You don't have to go to geosynchronous orbit, though, unless you want to step off and have nothing happen.
&lt;/p&gt;
&lt;p&gt;
A critique that made it to the New York Times was: "If you go to low earth orbit and step off, you fall." True. Of course the comeback is, go past low earth orbit by a bit, then step off and you fall towards the atmosphere, but you don't actually hit it. You bring along a small rocket, fire it, and you're in low earth orbit around the earth.
&lt;/p&gt;
&lt;p&gt;
So you don't have this big fiery explosive launch that gets you to orbit in 20 minutes, but you don't have the dangers that go along with such an enormous expenditure of energy in a short time.
&lt;/p&gt;
&lt;p&gt;
Instead you go up in something that's slower than an Indy racecar but faster than most people drive. It might be several hours to an interesting spot, several days to a more interesting spot.
&lt;/p&gt;
&lt;p&gt;
There are people going up now to look down on the earth at $20 million a pop. If the cost came down to $200K, $20K...
&lt;/p&gt;
&lt;p&gt;
...and there's a sociological argument that it would be a great thing for people to observe the earth from space, because there are no maps.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: Yes. It's historically had a powerful effect on the privileged few who have been able to go up and see that.
&lt;/p&gt;
&lt;p&gt;
So there's the solar satellite concept, there's space tourism, what else?
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Everything else in between. Things we could do in zero gravity. Today there are experiments with special drugs, special materials processing, but they're experiments. They have to fit in the bay of the space shuttle, it only goes up every 3 or 4 months, you're not going to build an industry that way. But if you could ship your goods up and down every day, to an orbiting manufacturing plant that you just carried up on one of the larger space elevators, that's not necessarily a dream, that could be a business plan.
&lt;/p&gt;
&lt;p&gt;
Two related things I meant to mention. First, it's highly scalable.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: Meaning?
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: If we can build a 20-ton elevator, we can build a 40-ton one. And by the way, the first thing you build with a space elevator is ... a space elevator. The very first payload will be the second space elevator.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: To be build in another location?
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Right.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: Which addresses the single point of failure vulnerability.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Yes. It is vulnerable, for all sorts of unfortunate reasons. But while it costs a lot to get the first one up, the second and third are relatively a lot cheaper.
&lt;/p&gt;
&lt;p&gt;
So they scale out. I'm a performance engineer, that's one of our favorite terms.
&lt;/p&gt;
&lt;p&gt;
And they scale up. There's no reason you couldn't scale up to a million pound elevator. And the thing is, there's no particular limit on the size and shape of the thing you carry.
&lt;/p&gt;
&lt;p&gt;
Today things have to fit in nosecones, basically. The shuttle is a long bay, but cylindrical and 15 feet wide. Here you're limited mostly by how much turbulence it can take during the first 10 kilometers of the climb.
&lt;/p&gt;
&lt;p&gt;
After that it's: "Oh, you want to build the space station." Just launch the space space station, tomorrow, and it'll be there in six days. That's a totally different way of looking at it.
&lt;/p&gt;
&lt;p&gt;
It's like container ships. They've totally changed global trade.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: I was going to mention that. There's a book called &lt;a href="http://www.worldcat.org/oclc/62161116"&gt;The Box&lt;/a&gt; about the innovation of the standard shipping container, which is the physical equivalent to packets in a packet-switching network.
&lt;/p&gt;
&lt;p&gt;
As a result, although it seems silly that some of the things we buy have criss-crossed the world...
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: ... interesting meta-discussion there...
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: ... yeah, but because of that technology, it really is economical for a lot of stuff to move around.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: I've seen bottled water from New Zealand. It astounds me. But it probably cost more to make the plastic container, which I consider to be heavy and low-value, than to ship water across the ocean.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: So the question remains, is this a leap of faith, like with the early space program, where we don't know the benefits but we intuit that there will be all kinds of spinoffs.
&lt;/p&gt;
&lt;p&gt;
This feels like that, so the more you can identify operations that benefit from arbitraging the different between earth and orbit...but it's not crystal clear to me there's a long list of those.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: No, I agree. A critic of the space tourism idea pointed out that, yes, airlines and before that steamships were all built on the desire of people to go somewhere interesting: Disneyland, or a business deal, or an exotic location. Currently space only qualifies on the last point. So the chicken/egg problem won't be cracked by space tourism.
&lt;/p&gt;
&lt;p&gt;
There have to be business reasons. We mentioned solar power. Another possibility is mining asteroids for iron and nickel. Those two items -- a lot of energy, and all this material -- there's a whole chapter in Dr. Edward's book that starts by figuring out your individual share of that. It's a lot.
&lt;/p&gt;
&lt;p&gt;
The way I look at it is that a quarter of the world lives at or near the US standard, there's another quarter trying to achieve that in the next generation, another quarter just starting, and then one behind the curve.
&lt;/p&gt;
&lt;p&gt;
Energy and materials are resources that space is full of.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: Clean energy in particular.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: It's getting a lot of attention lately, yes.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: Of course a lot of folks will point out that there are lots of earth-based solutions for clean energy, so why go to space for that. But I guess the answer is that it's not necessarily either/or.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Agreed. But consider this. Scientific American's &lt;a href="http://www.sciam.com/article.cfm?id=a-solar-grand-plan"&gt;January cover story&lt;/a&gt; is a proposal to do big solar by 2050. Part of it is to use 1/5 of the American desert. The environmentalists have to swallow that, there's an albedo change in the earth, and so on.
&lt;/p&gt;
&lt;p&gt;
Plus, it turns out that having a desert close to your civilization is fairly unique the world. Europe doesn't have one.
&lt;/p&gt;
&lt;p&gt;
Now factor in population growth.
&lt;/p&gt;
&lt;p&gt;
Meanwhile, satellite advocates say that if you had a band of solar cells a kilometer wide -- and that's a lot of solar cells, no doubt about it -- they'd produce, annually, energy equal to all known remaining oil reserves.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: It brings to mind one of the Long Now talks, I think by &lt;a href="http://www.kurzweilai.net/meme/frame.html?main=/articles/art0696.html"&gt;Vernor Vinge&lt;/a&gt;, in which he plots human civilization with population on one axis, and the amount of energy available to be used by an individual in the population on the other axis...
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: ... something similar was done at the conference, by the way...
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: ... OK, so you get a stepwise progression where each major advance in the level of civilization is tied to the amount of energy that could be mobilized by an individual.
&lt;/p&gt;
&lt;p&gt;
It's tricky to talk about that right now, in an era when it's critical that we conserve.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: But you only get to conserve once. Replace all the fluorescent bulbs with LEDs, and do everything else you can, and you bought yourself ten, twenty, maybe thirty percent. If you want to increase your standard of living you don't conserve, you use up more energy.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: Right. So, how cool is it to be a performance engineer at Microsoft, take an interest in this topic, and wind up bringing the conference to the Microsoft conference center?
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Yeah. By the way, about 10 percent of the attendees were employees.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: I was wondering about that. The conference was fairly small, right?
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Around 50 people.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: Most of whom are practitioners, engaged in the R&amp;amp;D.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Yes.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: But you got to bring it to other folks at Microsoft who, like you, the first time you went to one of these, will make all kinds of connections, and start thinking about the computational aspects.
&lt;/p&gt;
&lt;p&gt;
Well done! And what a treat for you to get the chance to do it.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Thanks. One nice thing was that it got posted on the internal calendar, and a guy I know found it, and he sent his two sons, a junior in high school and a junior in college. One's interested in mechanical engineering, and the other in civil engineering. I think this counts as a fairly large mechanical and civil engineering project!
&lt;/p&gt;
&lt;p&gt;
So I meant to mention one of the interesting side effects of the length of the elevator, which is: What happens when you let go? Remember, everything above geosynchronous is trying to tear the ribbon apart, it's trying to leave the earth. If you let go at the right time of year, and the right time of day, you'll be on your way to Mars really quickly.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: [laughs]
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: I'm serious.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: If you climb way above geosynchronous and let go, you slingshot to Mars?
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: That is the appropriate word. Slingshot. I'm talking about a couple of months. Faster than the fastest stuff we've sent with big heavy boosters for these little bitty probes. You just take the biggest thing you want to send, and just let go, it'll be there in two or three months, for free.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: [laughs]
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: And as a result, guess what's one of the first things you send to Mars using the earth space elevator.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: A Mars space elevator.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Yes. There's a book on that, by the way. Science fiction. It's called &lt;a href="http://www.worldcat.org/oclc/26054317"&gt;Red Mars&lt;/a&gt;. It has an unfortunate ending involving terrorism. And of course the first space elevator conference, where everybody had read that book, was early 2002. Dr. Edward's said: "We've all read Red Mars, and yes, we have to take this into consideration".
&lt;/p&gt;
&lt;p&gt;
There's been a study of how to defend the space elevator.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: I wasn't going to mention this, but that is maybe the worst vulnerability. You're not going to move it out of the way of a 747.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: The author's conclusion was that a consortium of private companies would own the space elevator, and the US government would trade defense for access to it.
&lt;/p&gt;
&lt;p&gt;
You'd spend a billion dollars a year parking American assets around the elevator's airspace, and it'd be like, do not fly here, just don't, you will be shot down with no questions asked.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: And unlike the problem of defending the ground, this is a relatively well-defined region of the sky that needs to be defended.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: Right, it's not the whole continent. But everybody agrees it'll have to be defended. And we have the infrastructure, for better or worse, to do that.
&lt;/p&gt;
&lt;p&gt;
But anyway, the possibility of moving around the solar system using space elevators is a whole other thing. Is that because there's interesting stuff out there? Because we're going to colonize Mars? Because we need more material sent down the elevator?
&lt;/p&gt;
&lt;p&gt;
You might say that's visionary, or you might say it's just being practical.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;JU&lt;/strong&gt;: Well thanks again, this has been fascinating and a lot of fun.
&lt;/p&gt;
&lt;p&gt;
&lt;strong&gt;MF&lt;/strong&gt;: You're welcome Jon!
&lt;/p&gt;&lt;img src="http://channel9.msdn.com/489760/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Maurice-Franklin-reflects-on-the-2008-Space-Elevator-Conference/</comments><link>http://channel9.msdn.com/posts/JonUdell/Maurice-Franklin-reflects-on-the-2008-Space-Elevator-Conference/</link><pubDate>Fri, 08 Aug 2008 14:24:00 GMT</pubDate><guid isPermaLink="false">http://channel9.msdn.com/posts/JonUdell/Maurice-Franklin-reflects-on-the-2008-Space-Elevator-Conference/</guid><evnet:views>765</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489760/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;p&gt;
Maurice Franklin is a 12-year Microsoft veteran whose career has focused on performance engineering and server scalability. He's also passionate about the concept of a space elevator, and recently organized and hosted a conference held on that topic at the Microsoft Conference Center.
&lt;/p&gt;
&lt;p&gt;
In this interview he discusses reasons to build a space elevator, and describes how the concept, first proposed by Arthur C. Clarke, is evolving toward a practical implementation.
&lt;/p&gt;</evnet:previewtext><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Maurice-Franklin-reflects-on-the-2008-Space-Elevator-Conference/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489760/Trackback.aspx</trackback:ping><category>space elevator</category></item><item><title>Ted Semon reflects on the 2008 Space Elevator Conference</title><description>&lt;p&gt;Ted Semon, a retired software engineer, chronicles the efforts to develop a space elevator on the &lt;a href="http://www.spaceelevatorblog.com/"&gt;Space Elevator Blog&lt;/a&gt;, and volunteers for &lt;a href="http://www.spaceward.org/"&gt;The Spaceward Foundation&lt;/a&gt; which administers &lt;a href="http://www.spaceward.org/elevator2010"&gt;competitions&lt;/a&gt; to develop several of the core technologies that will be needed to build the elevator. &lt;/p&gt;
&lt;p&gt;Ted attended and spoke at the &lt;a href="http://www.spaceelevatorconference.org/"&gt;2008 Space Elevator Conference&lt;/a&gt; held at the Microsoft Conference Center in Redmond. In this interview he discusses the concept of the space elevator, and the status of current efforts to bring it to life. &lt;/p&gt;
&lt;p&gt;In a &lt;a href="http://perspectives.on10.net/blogs/jonudell/Maurice-Franklin-reflects-on-the-2008-Space-Elevator-Conference/"&gt;related interview&lt;/a&gt;, Maurice Franklin, the Microsoft employee who brought the conference to Redmond this year, reflects on the conference and on the goals and status of the project. &lt;/p&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;&lt;img alt="" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/semon/semon.jpg" /&gt;
            &lt;div&gt;&lt;strong&gt;Ted Semon&lt;/strong&gt; &lt;/div&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: How did you become interested in the space elevator? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: I've always been a science fiction fan, and I read Arthur C. Clarke's &lt;a href="http://www.worldcat.org/oclc/4606759"&gt;Fountains of Paradise&lt;/a&gt; many years ago. The idea of the space elevator seemed so obviously the right way to get up out of Earth's gravity well. &lt;/p&gt;
&lt;p&gt;When I retired from the software world a few years ago, I decided to learn what was happening with the concept. There were blogs and websites, but nothing coherent, so I decided to pull the information together myself on the &lt;a href="http://www.spaceelevatorblog.com/"&gt;space elevator blog&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: At this point were we into the modern era of the development of the concept? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Yes, this was in 2006. I'd read the &lt;a href="http://www.worldcat.org/oclc/52067341"&gt;Brad Edwards book&lt;/a&gt;, but it was hard to find out what was currently going on, so I started the blog. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: The concept as described by Clarke is quite different from the modern one that's emerging, right? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: In some ways yes, in some ways no. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Can you spell out the differences? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: OK, there are several. He had located his port on the island of Sri Lanka. Current thinking is that it won't be a land port, it'll be an ocean-going port, so you can move the space elevator if you need to, and get it out of the way of satellites and other things in orbit. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: I gather that's a "when", not an "if". &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Exactly. There's stuff up there, it's going to intersect the elevator, you've got to deal with that. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And the object being moved, just to be clear, is a 100,000 kilometer strand of material. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Right. It's a carbon nanotube tether, or rope, or ribbon, whatever you want to call it. One end is anchored to an earth port, something like an ocean-going oil platform, and the counterweight ands at 100,000 kilometers up. &lt;/p&gt;
&lt;p&gt;By moving the ocean-going platform you can induce a wave that travels up the ribbon. You know which objects in space to worry about, at least the big ones, because you track them. And you know what's going on with the ribbon because you have sensors embedded in it, and climbers going up and down that signal their location. So you should be able to always move the ribbon out of the way of a collision. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So one difference from Clarke's original vision is that the platform is mobile and sea-based. What are some other differences? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Cost. He had imagined the cost would be something like the Earth's combined gross national products for a year, or some enormous number like that. The number now that looks more realistic is on the order of 10 billion dollars. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And what accounts for that lower estimate? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: More knowledge now about how it's going to be built. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Maurice Franklin and I discussed this, and his take was that the Clarke scenario assumed a huge mass parked in geosynchronous orbit, and that mass would be very expensive to lift. That ties into another evolution of the concept, which is that it's not now anchored with a large mass at 22,000 miles, but extends far beyond that. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Right. Something like 100,000 kilometers. The counterweight in the Edwards plan is about 600 metric tons, quite a bit smaller than the Clarke scenario. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: The reason that's possible is? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Because it's farther out in orbit. Well, it's hard to say an object anchored to Earth is in orbit, but it's 100,000 kilometers out. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: At its endpoint. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Yes. &lt;/p&gt;
&lt;p&gt;Let's see. He had also talked about the material being a carbon or diamond monofilament. I guess that's similar to carbon nanotube, and we should probably say he was right on that score. &lt;/p&gt;
&lt;p&gt;He hadn't talked about powering the climbers, though. They used batteries. In the Edwards concept, the climbers are laser-powered. Lasers will be aimed at photovoltaic cells on the bottom of the climbers. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So the climber is the robot that's attached to the tether, and ascends and descends? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Right. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: I've got some sense of what a carbon nanotube is. A sheet of carbon atoms folded into a cylinder. But I'm not at all clear now that translates into a 100,000 kilometer cable. What's the architecture of that cable? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: It's composed of fibers. When you buy a 50-foot rope at a store, there's no fiber in there 50 feet long. They're all woven together, and that's what'll happen with carbon nanotubes too. &lt;/p&gt;
&lt;p&gt;Right now the longest ones I know of, and have actually seen, are 5, 10, 15 millimeters long. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: The individual fibers? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Right. So one challenge is to grow a longer fiber. MIT, for example, is working with a company called NanoComp. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: There's obviously a limit to how far you can go in that dimension, so then it's a question of how to compose a ribbon out of these strands, probably at several levels of hierarchy. Just like the way the Golden Gate Bridge cables are multistranded at several levels of hierarchy. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Yes. The textile mills are very good at this stuff. If you give them fibers, they will weave you cables. The issue is going to be giving them carbon nanotube fibers of sufficient length and strength. That's where the bottleneck is now. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: What kind of diameter of cable are we talking about? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: If you're looking at the Edwards scenario, it's going to be a ribbon that's roughly 20 inches wide. And it is a ribbon. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Why? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: When you're in space, you want as wide a surface as possible. So a micrometeorite strike won't sever the ribbon, it'll only poke a hole in it. And if you have it woven correctly, the strain is taken up by nearby fibers. &lt;/p&gt;
&lt;p&gt;However the ribbon can be problematic in the atmosphere, because of wind effects. So it may be a cable in the atmosphere, widening out to a ribbon above. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: There needs to be a procedure for maintenance and repair, what's being discussed there? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Some people are talking about making the tether into a big loop that's constantly rotated down to Earth where you do the maintenance. Nice in theory, but you've doubled the length. And what do you do about having a cable in the atmosphere and a ribbon above? &lt;/p&gt;
&lt;p&gt;Another scenario is that the tether is made of segments. People worry that if the ribbon were cut, the two ends would fly apart. Not so. They'll sit there for some time, then gradually pull apart, but not like a snapping cable. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Not catastrophic? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: No, as long as you get to it in time. So you should be able to disconnect and reconnect segments. &lt;/p&gt;
&lt;p&gt;Another possibility: The climbers continuously reweave the ribbon as they go up and down. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: The carbon nanotube fiber and laser power beaming technologies seem to be two key ingredients in development. And those are what the &lt;a href="http://www.spaceward.org/games07.html"&gt;space elevator games&lt;/a&gt; test, right? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Yes. On the carbon nanotube front, there's a lot of work being done by industry and universities, and not only for the space elevator. Most people don't know or care about that, they just see a market for things much lighter and stronger than steel. &lt;/p&gt;
&lt;p&gt;And with carbon nanotubes being measured at 2, 4, 6, maybe even 8 gigapascals -- and these are big jumps over a few years ago -- there's a real sense that we're getting close to being able to make a ribbon strong enough to support an elevator. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: How strong is that? What are the forces acting on the ribbon? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: The original Edwards scenario called for 130 gigapascals. Since then there's been some rethinking. Some good aerospace engineers think it can be dropped to 60 or even 40 gigapascals. That doesn't mean 130 is outside the realm of possibility, but we'll get from 8 to 40 and 60 a lot sooner. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: But the existing results are for radically shorter lengths. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Yes, but you just need to something long enough to be woven. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Ropes stretch, though, and we don't have any examples of 100,000-kilometer ropes or cables. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Well, the total amount of cable in the San Francisco Bay Bridge would exceed the length of the space elevator. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Really? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: It's different of course because it winds back and forth and around things, but there is some experience there. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: In terms of the laser power beaming, is this also a case where development is occurring for all sorts of other reasons? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Exactly. NASA sponsors the space elevator games, but they mainly care about very strong materials, i.e. carbon nanotube tethers, and they care about power beaming, because they see applications for these things. &lt;/p&gt;
&lt;p&gt;Boeing has just come out with a solid state laser in the 25 kilowatt range, and they say they can go to the 100 kilowatt range. If you can get 20 of those, that's enough to power your climbers. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: What are the current applications of those? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Beaming power to a moon buggy so it doesn't have to carry batteries. Airships that stay up for weeks at a time. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Are any of these concepts real yet? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: No, not yet, but the needs exist and they're trying to develop the technology to satisfy those needs. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: I gather one potential showstopper is the threat of natural or manmade attack. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Actually the latter wasn't discussed too much at the conference, mostly the former. Micrometeorites, space junk. &lt;/p&gt;
&lt;p&gt;That's being addressed in two ways. For small things, make the tethers wide enough, and engineer a replacement lifecycle. For large things, move the elevator out of the way. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: That computational grand challenge dovetails with Microsoft's strengths and interests, so that might be one interesting outgrowth of having had Microsoft sponsor and host the conference. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: That'd be great! &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Give us a sense of who was at the conference, what was discussed, and what emerged. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: There are two answers, I guess. First, there were a lot of old pros, people who've been working on this for years, and have come up with the intial concepts and solutions. &lt;/p&gt;
&lt;p&gt;Then there are some new people in the last year. Some were invited, some just showed up. &lt;/p&gt;
&lt;p&gt;There's an effort to make this into an international campaign. We've adopted the "four pillar" concept. It's something you need for any huge infastructure project. The pillars are: technical capabilities, a business plan, a legal and insurance framework, and public support. &lt;/p&gt;
&lt;p&gt;That hadn't come together in the past, but this year we think we've gotten the enthusiasm, and especially the international support, to sustain that four-pillar approach going forward. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: We mentioned the Golden Gate Bridge. I recently learned that it wasn't a federal project, it was a municipal project. Likewise, the space elevator would perhaps ideally not be a big federal project. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: I don't think it has to be. I gave a talk this year on who I thought would build the first one. Ten billion dollars is a large sum, but not out of the reach of non-governmental entities. Money's an issue, but it's not going to be the showstopper. &lt;/p&gt;
&lt;p&gt;I do think you'll need a government involved for defense, and for insurability, because international treaties will have to be written, and I think a government will be able to do that more easily than a business consortium. &lt;/p&gt;
&lt;p&gt;I could see a group of US businesses getting together and saying to the US government, we'll take the financial and technical risk, in return please defend our elevator and help us deal with the insurability. If you do, we'll make you a deal: free launches, or discount launches. I'm sure there's a deal that can be made. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: In terms of why to do it, obviously everyone close to the concept takes it on faith that it's a good thing to do for all sorts of compelling reasons. To me, the solar satellite concept is maybe the most compelling, is that the application advocates tend to lead with? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Many do. I don't personally. I'm skeptical. We use so much energy, and to put enough stuff into space to create space-based solar power that would make a significant dent, well, the amount of material is huge. &lt;/p&gt;
&lt;p&gt;And we're not going to have a space elevator for 20 or 30 years. Meanwhile our problems will keep getting worse. We may have pilot projets, but nothing that'll power your refrigerator... &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: ...or have any significant effect on greenhouse gases. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Correct. Having said that, the concept is outstanding. And while I'm skeptical, I'm in the minority. Most advocates see it as a huge reason to build the space elevator. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: What are the other reasons that come up? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Look at what you get: Enormous capacity, low cost, safer launches, and low environmental impact. Any industry that needs those benefits will want the space elevator. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Such as? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Well, what's making money right now is communications satellites. That's a big and growing industry. It'll be much easier to build satellites that don't have to reach orbit in rockets, and much cheaper to send them up. &lt;/p&gt;
&lt;p&gt;Another will be orbital tourism. Being able to go up 100 miles, spend the afternoon, and come down -- we think that'll be a big moneymaker. &lt;/p&gt;
&lt;p&gt;Then there are industries that don't exist today, except in labs, that need a space environment. To get them up today, you're talking about thousands of dollars a pound in rockets, and not a whole lot of pounds. With a space elevator it's hundreds of dollars a pound or less, and a lot of capacity. &lt;/p&gt;
&lt;p&gt;I think once it's there it'll make a ton of money for somebody, maybe lots of somebodies, because you want more than one space elevator. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Right. The first one is the bootstrap that gets you to others. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Yes, and once you've got two, now you're in business. One of your failsafe scenarios is that you can leverage one to fix the other. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So at the conference, the discussion was more about how to get it done than why to do it. What were the conclusions? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: That we're closer than ever. Arthur C. Clarke said that a space elevator would be built 50 years after people quit laughing about it. Well, people quit laughing some time ago. I think his prediction is a bit pessmistic. I think we're looking at 2020 to 2030 to actually be able to put one up. There's a general feeling that this is a real possibility, that it could happen in our lifetimes. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: If I were to attend the space elevator games, what would I see? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: You'll see a helicopter lifting a steel cable 1 kilometer up, and you'll see teams attaching climbers to the cable, and beaming power to photovoltaic cells on the climbers. We're hoping to have several competitors this year with a real shot at winning the prize. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And the prize is? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: There are two. If you can get up the kilometer cable in two meters per second, and you're the only one who does, there's a million dollar prize. If you can do it at five meters per second, and you're the only one who does, there's a two million dollar prize. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Really? So a million bucks for a laser-powered climber to go two meters per second, and nobody's claimed that yet? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: That's right. Last year it was 100 meters, and before that 50. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So you've raised the bar? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: And the prize money, yes. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And what's the other prize? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: For the strongest tether. Also a one million and two million dollar prize. You have to beat that house tether. Yours can be two grams, the house tether can be three, and it's made from commercially available material. So if you bring something new, like a carbon nanotube tether, and it can beat the house tether which is heavier, you can win the prize. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And the lengths are? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Two meters I think. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Oh, OK, so nothing like the kilometer climb. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: No, it's a two-meter loop. Yours and the house tether are placed onto a special machine designed for this event, it stresses them equally, whichever breaks first loses. &lt;/p&gt;
&lt;p&gt;Nobody's come close to winning that one yet. But last year we had our first carbon nanotube tether. MIT brought it, working with NanoComp. But it had been done so close to the competition that they weren't able to weave it a loop. So they actually tied a knot. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: They tied a knot!? [laughs] &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Exactly. But this year they'll have more time to prepare, and we know of at least one other team bringing a carbon nanotube tether. &lt;/p&gt;
&lt;p&gt;So our ideal scenario for this year is that we have a carbon nanotube tether that blows away the house tether, and a 5-meter-per-second climber. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And it'll happen where? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Not definite yet, but we're hoping for Meteor Crater in Arizona. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: It's inspiring to think about this stuff! &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: It's very inspiring to be on the inside. I got involved just because I was interested, but now I'm a huge fan and I'll do everything I can to help make it happen. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: The project seems to be attracting a variety of folks, from all walks of life, who are showing up, and wanting to participate, and finding ways to participate. &lt;/p&gt;
&lt;p&gt;Maurice Franklin, for example, a Microsoft employee, has now made a real contribution by organizing this year's conference. But he also talks about some other folks who showed up, uninvited, with relevant engineering credentials, and made real contributions. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: That's right. People like Maurice will be the lifeblood of this project. And when you get involved, and start to see that this isn't some science fiction idea that's never going to happen... &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: ... and that there are serious people, with serious engineering credentials, working the problem in a pragmatic way. It might not happen, but it could. Thanks Ted! &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TS&lt;/strong&gt;: Thank you, Jon. &lt;/p&gt;&lt;img src="http://channel9.msdn.com/489759/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Ted-Semon-reflects-on-the-2008-Space-Elevator-Conference/</comments><link>http://channel9.msdn.com/posts/JonUdell/Ted-Semon-reflects-on-the-2008-Space-Elevator-Conference/</link><pubDate>Fri, 08 Aug 2008 14:21:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/semon/semon.wma</guid><evnet:views>1010</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489759/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;p&gt;Ted Semon, a retired software engineer, chronicles the efforts to develop a space elevator on the &lt;a href="http://www.spaceelevatorblog.com/"&gt;Space Elevator Blog&lt;/a&gt;, and volunteers for &lt;a href="http://www.spaceward.org/"&gt;The Spaceward Foundation&lt;/a&gt; which administers &lt;a href="http://www.spaceward.org/elevator2010"&gt;competitions&lt;/a&gt; to develop several of the core technologies that will be needed to build the elevator. &lt;/p&gt;
&lt;p&gt;Ted attended and spoke at the &lt;a href="http://www.spaceelevatorconference.org/"&gt;2008 Space Elevator Conference&lt;/a&gt; held at the Microsoft Conference Center in Redmond. In this interview he discusses the concept of the space elevator, and the status of current efforts to bring it to life. &lt;/p&gt;</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/semon/semon.mp3" expression="full" duration="36" fileSize="17451840" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/semon/semon.wma" expression="full" duration="36" fileSize="17659877" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/semon/semon.wma" length="17659877" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Ted-Semon-reflects-on-the-2008-Space-Elevator-Conference/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489759/Trackback.aspx</trackback:ping><category>podcasts</category><category>space elevator</category></item><item><title>How Microsoft's External Research Division works with a new breed of e-scientists</title><description>&lt;p&gt;Tony Hey, VP for the External Research Division within Microsoft Research, leads the company's efforts to build external partnerships in key areas of scientific research, education, and computing. He's been a physicist, a computer scientist, and dean of engineering, and for five years ran the UK's e-Science program. These experiences have given him a broad view of the ways in which all the sciences are becoming both computational and data-intensive. Microsoft tools and services, he says, will support and sustain the new breed of scientists riding this new wave. &lt;/p&gt;
&lt;p&gt;
Audio: &lt;a href="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hey/hey.wma"&gt;WMA&lt;/a&gt;, &lt;a href="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hey/hey.mp3"&gt;MP3&lt;/a&gt;
&lt;/p&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;&lt;img alt="" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hey/hey.jpg" /&gt;
            &lt;div&gt;&lt;b&gt;Tony Hey&lt;/b&gt; &lt;/div&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: For this series of interviews I've spoken to a number of Microsoft folks who are working with external academic partners on projects that fall under your purview. The list includes Pablo Fernicola's &lt;a href="http://perspectives.on10.net/blogs/jonudell/Word-for-scientific-publishing/"&gt;Word add-in for scientific publishing&lt;/a&gt;, Catharine van Ingen's collaboration with Dennis Baldocchi at Berkeley on the &lt;a href="http://perspectives.on10.net/blogs/jonudell/Making-sense-of-C02-data/"&gt;analysis of C02 data&lt;/a&gt;, and Kyril Faenov's HPC++ project to bring &lt;a href="http://perspectives.on10.net/blogs/jonudell/Cluster-computing-for-the-classroom/"&gt;cluster computing to the classroom&lt;/a&gt;. These are all pieces of your puzzle, right?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Absolutely.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: By way of background, you've been a physicist, then a computer scientist, and then for a time led the UK's e-science program.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Which would be called cyberinfrastructure in the US, yes. I'm on the NSF's advisory committee for cyberinfrastructure, it's a very similar goal.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: And then you surprised a lot of people by joining Microsoft. Take us through your initial role leading the TCI [technical computing initiative] and on to your current expanded role leading MSR's external research efforts.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Right. So having been a physicist, and then a computer scientist working on parallel computing for years, and then chair of my computer science department and then dean of engineering, I think I understand the community we're trying to work with pretty well.&lt;/p&gt;
&lt;p&gt;Also, as you mentioned, I worked for 5 years running the UK e-science program. That was about huge amounts of distributed data, and collaborative multi-disciplinary research in a variety of fields. The environment, bioinformatics, almost every field of science now has some element of distributed and networked collaboration.&lt;/p&gt;
&lt;p&gt;The science agenda was for the tools and technologies to make that collaboration trivial, just as with Web 2.0 your grandmother can do a mashup.&lt;/p&gt;
&lt;p&gt;I don't think the UK e-science program achieved that, but I do believe that Microsoft can help make tools and technologies available that will help scientists and researchers do their work.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: In your parallel computing phase, you helped write the MPI [message passing interface] specification, correct?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Yes. I've been in this for 30 years, on and off. I have very good friends in the high-performance and parallel computing communities here in the US, and I was involved in European projects. There was a danger that the Europeans would go one way, and the US another, so it was time to see if we could get the community to put together a community standard. &lt;/p&gt;
&lt;p&gt;It isn't an ISO standard, there wasn't a big standards body, it was a group of experts who got together with the academics and with the industry players. Rather a small set, and we used to meet every 6 weeks in Dallas airport, so you really had to be dedicated to go there.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: [laughs]&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: But what came out of it was a standard which has stood the test of time. I co-authored and initiated the first draft. It's been much changed since then, and I don't take credit for the final thing, but I did try, with Jack Dongarra, to initiate the standards process, and I think I remember buying the beer at the first session.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: What's interesting to me is that despite that, you've been a vocal skeptic regarding raw grid capability. And you've been very careful to stress that in your view, the real challenges have to do with data -- the ability to combine large quantities of data from multiple sources, and enable people to make sense of it.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Yes. I used to work in high-end supercomputing and parallel computing, but what distinguishes this decade is that we'll collect more scientific data than we have collected in the whole of human history. Instead of struggling with the problem of too little data, scientists will be struggling with the problem of huge amounts that they can't process or analyze. And it may be stored in different places, on different continents, so how do you put it together? How do you federate?&lt;/p&gt;
&lt;p&gt;That's the real challenge. Very people want to use petaflop computers. Most of the biologists, chemists, and engineers only need lesser capabilities that can be provided by just a simple cluster. And then you put the cluster where the data is, because that's what's difficult to move around. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Yes, Kryil Faenov made this same point in my interview with him. There are only a handful of intergalactic cloud infrastructures of the sort that a Google or Amazon or Microsoft can support, they're one-of-a-kind beasts, and you can't always bring your data to them. So he's interested in enabling organizations to stand up their own more modest clusters at the sites where the data lives. &lt;/p&gt;
&lt;p&gt;So, let's discuss the opportunity that you see. In another interview you said: &lt;/p&gt;
&lt;blockquote&gt;Rather than wasting the enthusiasm and talents of science graduate students by assigning them the task of building systems capable of handling, analyzing and mining literally petabytes of data, scientists should look to computer scientists and the IT companies to raise the level of abstraction and to provide them with the components of a reliable and functional cyberinfrastructure. &lt;/blockquote&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;That's the most concise mission statement I've found for what you're doing.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Exactly right. Part of my reason for joining Microsoft was having had a great friendship, and many discussions and arguments, with Jim Gray, from 2001 onwards. &lt;/p&gt;
&lt;p&gt;We argued and disagreed on many things, but we also agreed on things, and what we agreed on in particular is that a different paradigm is emerging. So for example there's experimental physics, there's theoretical physics, and now the third paradigm, it's clear, is computational physics based on simulation. &lt;/p&gt;
&lt;p&gt;What we're looking at here is data-centric science, where you'll do collections-based research -- like you do in mashups, but now with scientific datasets. And increasingly, you'll use semantics to get from data to information to real knowledge. &lt;/p&gt;
&lt;p&gt;So I came to Microsoft partly because of Jim Gray, but partly because I think companies can help. I struggled mightily with just open source tools. I used to produce open source tools myself, as an academic. MPI has a wonderful open source implementation, and that was one of the key things that we did.&lt;/p&gt;
&lt;p&gt;But I also know that open source, particularly when produced by academics like myself, well, it works on my machine, but if you want it to work on your machine, that's your problem. &lt;/p&gt;
&lt;p&gt;So one of the things I set up in the UK was, in fact, a software engineering center called the &lt;a href="http://www.omii.ac.uk/"&gt;Open Middleware Infrastructure Institute&lt;/a&gt;, where I put a lot of money in to get these open source codes tested and documented and made more reliable and sharable.&lt;/p&gt;
&lt;p&gt;That's why I think that a judicious mix of open source with commercial -- it could be from IBM, from Oracle, from Microsoft -- is the way to provide a more reliable infrastructure.&lt;/p&gt;
&lt;p&gt;That's part of the motivation for the tools we're producing around the technologies that scientists use to do their publication, their data mining, and so on. I think Microsoft can really take a lead here, and that's why I joined.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Elsewhere you've said: &lt;/p&gt;
&lt;blockquote&gt;Essentially I match up Microsoft researchers with major scientific problems that computer science technology can help to solve. &lt;/blockquote&gt;
&lt;p&gt; &lt;/p&gt;
&lt;p&gt;What are those major problems?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: So, I came with a purely scientific mission with TCI. But now I've moved into Microsoft Research, and we have a bigger agenda. In terms of external research, we focus on four themes. &lt;/p&gt;
&lt;p&gt;One is health and wellness. That's bioinformatics, medical solutions, and so on. Really exciting, we've got some good projects in that area.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: I've talked to Kris Tolle and have done an &lt;a href="http://perspectives.on10.net/blogs/jonudell/Making-sense-of-electronic-health-records/"&gt;interview with George Hripscak&lt;/a&gt; who's one of the recipients of funding in the &lt;a href="http://www.microsoft.com/presspass/press/2008/apr08/04-17GWASPR.mspx"&gt;genome-wide association studies program&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Kris is great, she and Simon Mercer are looking at the biomedical area, and they've got a wonderful set of projects ranging from high-tech stuff involving RNA and HIV/AIDS down to the last mile of preventative health care, looking at ways in Latin America to take a smartphone and connect it to a low-cost diagnostic tool, like a blood-pressure monitor, and therefore do health care in these remote places.&lt;/p&gt;
&lt;p&gt;The next major area is what we call E3 -- earth, energy, and the environment. That includes the astronomy work that Jim Gray started, which we now have followed up with the WorldWide Telescope, which is a wonderful tool.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: It's a brilliant thing. I've actually done two in-depth conversations about it for this series. One with &lt;a href="http://perspectives.on10.net/blogs/jonudell/The-story-of-the-WorldWide-Telescope/"&gt;Curtis Wong&lt;/a&gt;, and the other with &lt;a href="http://perspectives.on10.net/blogs/jonudell/How-the-WorldWide-Telescope-works/"&gt;Jonathan Fay&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: It does exactly the things we were talking about, it takes lots of distributed data sets, and allow you to search and visualize and do wonderful things. &lt;/p&gt;
&lt;p&gt;So that's one example of an E3 project. Catharine van Ingen's project is another, and there are others. There's a project called the &lt;a href="http://www.swiss-experiment.ch/index.php/Category:About"&gt;Swiss Experiment&lt;/a&gt; that's putting sensors all through the Swiss Alps to measure environmental changes.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Before we discuss the other two areas, let me just ask: What is a project? I gather sometimes Microsoft Research puts out an RFP, and somebody like George Hripcsak at Columbia is awarded money to pursue his research. In other cases, though, there isn't necessarily funding, it's more of a collaboration, as with Catharine van Ingen and Dennis Baldocchi.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Yes. In all cases, I want us to focus on genuine partnership with the academics. It has to be win/win on all sides. There are all sorts of ways. RFPs are one. Targeted funding, like we used to do in TCI, maybe sponsoring post-docs. But other things too, like delivering tools, data sets, services. &lt;/p&gt;
&lt;p&gt;What can we do for the computer science community? That's another of our themes.&lt;/p&gt;
&lt;p&gt;I used to teach in a computer science department, and I assure you my department was not atypical. We taught Linux, Apache, MySQL, PHP, Java, and they used a variety of scripting languages -- Perl, Python, and now Ruby on Rails. &lt;/p&gt;
&lt;p&gt;To teach computer science principles it's quite clear you don't necessarily need any Microsoft technology. So the question is, how do we engage with academics in the computer science disciplines?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: And what are your thoughts?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: We have an opportunity. We need to look at what services, what data, what resources we can give them, so we can partner in a way that they feel is beneficial, so they can do research in the way they want to, and we can find out what services they need, and how we can make our tools more valuable.&lt;/p&gt;
&lt;p&gt;Microsoft does now have the beginnings of some exciting service offerings. There's Live Mesh, and we have .NET online services coming along...I liked our internal name, CloudDB, better than SQL Server Data Services, SSDS, but...&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: ...that's how it always goes.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: That's the way of it, yes. So that's in beta at the moment, and I hope by the time of the PDC in October we'll have a lot more concrete things to show. What I need to do is see what we can offer the academic community in terms of resources. Can we help them to explore multi-core? Can we get them data sets at scale that we've anonymized, so they can do research they'd otherwise not be able to do?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: And &lt;a href="http://research.microsoft.com/research/sv/Dryad/"&gt;Dryad&lt;/a&gt;?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Yes. We now have within Microsoft Research some internal resources -- cores -- and I want to make some of that available externally, and put some services around it, such as Dryad or Dryad/LINQ.&lt;/p&gt;
&lt;p&gt;At the &lt;a href="http://research.microsoft.com/workshops/fs2008/"&gt;Faculty Summit&lt;/a&gt; I want to ask the community -- and after all, I came from that community -- how can we partner with you so that we can give you things that you value, and get your feedback?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: What is the Faculty Summit, who's been invited, and what do you aim to accomplish there?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: It's an annual event in the U.S., three or four hundred academics come, mainly computer scientists from the U.S. but there's a sprinkling from around the world -- India, China, Latin America. Really it's an opportunity for us to connect. &lt;/p&gt;
&lt;p&gt;I've talked about health and wellness, earth/energy/environment, and computer science. Another area of focus is education and scholarly communication. We'll be unveiling plugins for our tools that make them more useful for scientists to do what they want to do.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: The &lt;a href="http://www.microsoft.com/downloads/details.aspx?FamilyID=09C55527-0759-4D6D-AE02-51E90131997E&amp;amp;displaylang=en"&gt;NLM add-in for Word&lt;/a&gt; is an obvious example. Are there others?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Yes, we'll announce a Creative Commons plug-in. Many people use Word, PowerPoint, and Excel, and are happy to share their documents. We'd like to give them a plug-in that will help them attach Creative Commons licenses to those documents.&lt;/p&gt;
&lt;p&gt;We'll also have a research repository. At the university, I was supposed to monitor the output of my faculty -- 200 academics and 500 post-docs and grad students. What we did was insist on keeping a digital copy of not only publications, but also presentations at conferences, research reports, videos, data...&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: ...especially data. That's a huge new area.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: It is in my view, yes. My undergraduates and engineering faculty never went into the library for traditional library purposes. They went there for a cup of coffee, a chat with their friends, a warm place to work, but not as a library.&lt;/p&gt;
&lt;p&gt;So what is the role of the library? My view is very much the MIT DSPACE view that's been promoted. The role of a research library in a university is to be the guardian of the intellectual output of the university. And that needn't just be research, it can be teaching materials.&lt;/p&gt;
&lt;p&gt;So we've used SQL Server, and the Entity Framework -- a bit like the RDF model of Tim Berners-Lee and friends -- to capture some semantic knowledge. So it tells you this is a presentation, Tony Hey gave it, the local organizers were so and so, it was done on this date, and so on. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: There's also the general notion of wrapping services around raw data sets. I've &lt;a href="http://blog.jonudell.net/2007/07/06/a-conversation-with-timo-hannay-about-the-scientific-web/"&gt;talked with Timo Hannay&lt;/a&gt; at Nature about how often, nowadays, somebody winds up publishing a paper as a "fig leaf of analysis" to cover what's really the publication of some data set. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Timo and I absolutely agree on this. Research repositories which contain text and also data are going to be increasingly important.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Although you're not wild about the term "data services", it's actually useful. I was talking with Jonathan Fay about his discovery of all the astronomical data that's online. On the one hand, it was astonishing to find that it was available at all. On the other hand, in the grand tradition of academia, these were gzipped tarballs that you could only use if you had an extreme amount of specialized knowledge and capability.&lt;/p&gt;
&lt;p&gt;What you get, with WorldWide Telescope, is a service layer wrapped around all that raw data that makes it available to a vastly wider audience.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Absolutely. Same with Catharine van Ingen's project. This stuff was locked away in files, and nobody knew what was there. By making it available and exposing it in new ways...you're right, these data services are very important.&lt;/p&gt;
&lt;p&gt;And they're the basis of some of our other projects. So for example, Valerie Daggett at the University of Washington does protein folding, but she also does protein unfolding. She regards protein folding ab initio, right from the beginning with just the structure, as too difficult. So she takes the folded structure and unfolds it, and then looks at the possible foldings you can get. She calls this &lt;a href="http://peds.oxfordjournals.org/cgi/content/abstract/21/6/353"&gt;dynameomics&lt;/a&gt;. It involves storing detailed simulations, and we've made a database to help her do that. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: How would you characterize the nature of the collaboration between Microsoft Research and Valerie Daggett? &lt;/p&gt;
&lt;p&gt;So, for example, with Catharine van Ingen and Dennis Baldocchi, it was a really interesting mesh of interests and capabilities. Dennis is a climate scientist who's plugged into a worldwide network of sensors, but he's not an informatician, he's not someone with deep training in how to probe and reshape a body of data. But that's what Catharine brings to the table.&lt;/p&gt;
&lt;p&gt;So in this protein-folding collaboration, what's the partnership really about? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: It's on two levels. Valerie really is a computational scientist. She does these computationally-intensive calculations, and she uses national supercomputers. &lt;/p&gt;
&lt;p&gt;One of the things we've done is give them experimental Windows HPC clusters, so instead of doing it remotely they can actually get a lot of calculations done on local machines. &lt;/p&gt;
&lt;p&gt;The other part is that they don't have particular expertise in databases. So &lt;a href="http://research.microsoft.com/~stuarto/"&gt;Stuart Ozer&lt;/a&gt;, who used to be in Jim Gray's group and now is back with SQL Server, collaborated with them to set up a data cube.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: It seems like the transfer of database expertise is a common thread in a lot of these collaborations. Although many of these folks may be computationally-oriented scientists, and may know how to work with algorithms and with code, the data management is another kind of discipline, and not one that necessarily comes naturally.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: That's right. By the way, we're also active in computational education for scientists. When I did that in the 80s and 90s it was about algorithms and parallelism and things like that. But you're quite right, it's now, in addition to those things, about knowing how to deal with data. &lt;/p&gt;
&lt;p&gt;We have projects with two Nobel Prize winners, &lt;a href="http://www.mit.edu/~biology/facultyareas/facresearch/sharp.html"&gt;Phil Sharp&lt;/a&gt; at MIT, and &lt;a href="http://www.scientificblogging.com/cwieman"&gt;Carl Wieman&lt;/a&gt; at Vancouver, looking at what you teach biologists and physicists about new skills, in order to produce a new generation of computational scientists who understand the data as well as the computation.&lt;/p&gt;
&lt;p&gt;And I'd be remiss if I didn't mention &lt;a href="http://research.microsoft.com/aboutmsr/presskit/semmott/"&gt;Stephen Emmott&lt;/a&gt;. I emphasize the data, but he'd say that the complexity of the modeling that you have to do with this data is as important. And therefore, some of the abstractions from computer science can really help the modeling side of science.&lt;/p&gt;
&lt;p&gt;One of our engagements is a joint bioformatics modeling institute in Trento, and that's an initiative of Stephen Emmott and his team. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: I guess that most people know Microsoft has a massive research arm, and there's been a lot said and written about internal technology transfer -- something gets invented in MSR, then it's thrown over the wall into a product group. People have heard that story, but this other story about external collaboration isn't so well known.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: That's true, though it does link to our research within MSR. In terms of the computer science and education communities, we have wonderful tools here that actually don't end up in products. One of the things I hope to do is make more of these available. &lt;/p&gt;
&lt;p&gt;We now, at Microsoft, have two OSI-approved open source license, &lt;a href="http://www.microsoft.com/resources/sharedsource/licensingbasics/publiclicense.mspx"&gt;Ms-PL&lt;/a&gt; and &lt;a href="http://www.microsoft.com/resources/sharedsource/licensingbasics/reciprocallicense.mspx"&gt;Ms-RL&lt;/a&gt;. I'd like to make some of our tools, which aren't going into products, available so that we can build communities and show what great tools there are. Tools that really do things the computer science community and science community want.&lt;/p&gt;
&lt;p&gt;So, I talked about our four themes -- health and wellness, earth/energy/environment, computer science, education and scholarly communication. In addition we have what we call ARTS: Advanced Research Tools and Services. There we're trying to develop tools and services that academics and computer scientists will find valuable.&lt;/p&gt;
&lt;p&gt;And there are many others. We just did a count, and in total, with RFPs and small projects and big projects, we had, over the whole of Microsoft Research something, like 400 projects with external partners in universities. &lt;/p&gt;
&lt;p&gt;My challenge is to focus that a bit more, and make sure we capture and build on the ones that are successful.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Well, very good, Tony. Thanks a lot!&lt;/p&gt;
&lt;p&gt;&lt;b&gt;TH&lt;/b&gt;: Thanks very much, Jon.&lt;/p&gt;&lt;img src="http://channel9.msdn.com/489758/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/How-Microsofts-External-Research-Division-works-with-a-new-breed-of-e-scientists/</comments><link>http://channel9.msdn.com/posts/JonUdell/How-Microsofts-External-Research-Division-works-with-a-new-breed-of-e-scientists/</link><pubDate>Thu, 31 Jul 2008 16:49:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hey/hey.wma</guid><evnet:views>1353</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489758/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>Tony Hey, VP for the External Research Division within Microsoft Research, leads the company's efforts to build external partnerships in key areas of scientific research, education, and computing. He's been a physicist, a computer scientist, and dean of engineering, and for five years ran the UK's e-Science program. These experiences have given him a broad view of the ways in which all the sciences are becoming both computational and data-intensive. Microsoft tools and services, he says, will support and sustain the new breed of scientists riding this new wave.</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hey/hey.mp3" expression="full" duration="1800" fileSize="14223360" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hey/hey.wma" expression="full" duration="1800" fileSize="14389717" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hey/hey.wma" length="14389717" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/How-Microsofts-External-Research-Division-works-with-a-new-breed-of-e-scientists/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489758/Trackback.aspx</trackback:ping><category>e-science</category><category>Microsoft Research</category><category>podcasts</category><category>tony hey</category></item><item><title>How the WorldWide Telescope works</title><description>&lt;p&gt;Jonathan Fay is principal developer of the WorldWide Telescope. In this interview he explains how the project has yielded not only a breakthrough software product, but also a reference model for the acquisition, transformation, and visualization of astronomical data. You'll learn not only how the WorldWide Telescope works, but also why it exists: To fulfill the education mission discussed in a related &lt;a href="http://perspectives.on10.net/blogs/jonudell/The-story-of-the-WorldWide-Telescope/"&gt;interview with Curtis Wong and Roy Gould&lt;/a&gt;. &lt;/p&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;&lt;img alt="" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/fay/fay.jpg" /&gt;
            &lt;div&gt;&lt;b&gt;Jonathan Fay&lt;/b&gt; &lt;/div&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: As long as I've been doing computers, going back to the early 1980s on TRS-80, graphics, and visualization of data and the earth and space, were interests of mine. I'd gotten a department-store telescope one year for Christmas, and loved looking at stuff through the light-polluted LA skies.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So you were in the same boat as Curtis Wong?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: Yeah, you could really only see planets and the moon in any detail. But I was passionate about computers and astronomy. Every time computers got more powerful, I'd look into visualizing the Mandelbrot set and the stars as a litmust test.&lt;/p&gt;
&lt;p&gt;In 2001 I was development manager for HomeAdvisor, and we were assimilating a research project called TerraServer. Tom Barclay, a researcher who was working with Jim Gray, said, "Hey, USGS has this DEM -- digital elevation model -- data that they'd like me to load into TerraServer. I wonder if you have ideas about what we could do with it."&lt;/p&gt;
&lt;p&gt;I'd been very much into 3D visualization. I have this program called LightWave, which goes back a long time but is now used for things like Serenity and BattleStar Galactica, so I started taking TerraServer images and USGS data and creating hills with texture-mapped images.&lt;/p&gt;
&lt;p&gt;Then Tom Barclay told me how NASA was using satellite weather data, watching over many days, and getting rid of the clouds so you could see the surface of the earth. They called it the Blue Marble project. I found and downloaded that data, and also some global digital elevation data, and starting creating a hierarchical 3D view of the earth so you could zoom in and browse. Then I worked to bring that into TerraServer, because we had resolution down to a couple of meters.&lt;/p&gt;
&lt;p&gt;But this was just a side project, and there wasn't interest in developing it, so I decided to look into visualizing other astronomy data.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: This was around the time in 2002 when Jim Gray and Alex Szalay published their paper entitled the World-Wide Telescope?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: Right. Jim talked about TerraServer "pointing up" as the next thing. He was already getting himself embedded with astronomers. I didn't see much of that. Tom was babysitting TerraServer while Jim went off into the astronomy end of things, and I was still doing geo, so we weren't collaborating. &lt;/p&gt;
&lt;p&gt;After having made some demos, a lot of people thought it was cool, but that was all. So I kept that on the back burner, and moved into some other groups. At the same time I was building my &lt;a href="http://www.bearcreekobservatory.com/"&gt;observatory&lt;/a&gt;. In Seattle, you take pictures when you can. If you can't push a button and have your observatory open up and take images when you get clear skies, by the time you set up you'll be clouded in. I wanted to automate the whole process, including image processing. That introduced me to the whole pipeline of data collection, processing, and subsequent research.&lt;/p&gt;
&lt;p&gt;Although I'm an amateur, I had to drill into the world of data and image processing that professional astronomers had to deal with. I was using the same resources.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: I'd like to hear more about that. A lot of us are aware that those data and image resources exist, but it's really unclear how to make use of them. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: You know, there is a lot available, but most amateur astronomers had no idea it existed, it was very hard to get to, and even the scientists had a hard time getting access to it. Essentially it was locked up in silos.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: If you know where to find the gzipped tarball, and then if you can unzip it and figure out how to use it, without any documentation about metadata and formats...&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: Right. So, I'd heard about this very large database of stellar objects, the US Naval Observatory's USNOB. It was 100 gigabytes. At that time, there were barely consumer hard drives that could hold that. Forget transferring it over the network, it's 120 CDs, the only way to transfer the data was to ship hard drives around the country.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Yeah, I remember Jim talking about doing that.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: I'm just an amateur, but I feel like I need the data, so I found out that this guy named Dave Monet, in Flagstaff, would let me ship him a hard drive and he'd put the data into a Linux-formatted partition and send it back.&lt;/p&gt;
&lt;p&gt;On the one hand, I was shocked to see how easy it was for me to get access to the same data that the professional astronomers were using. And by easy, I mean it was possible.&lt;/p&gt;
&lt;p&gt;But on the other hand, I realized you had to be really committed, and know exactly what you're doing.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Right. There were no services wrapped around the data to make it useable by anybody other than a 100% focused and dedicated researcher.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: As I started doing more with imaging, I had the concept that I should flip my earth inside out and render the sky. One of my friends, Doug George, created a full-sky survey, in gorgeous color, but the software that went around with it would take ten or 15 seconds every time you moved your view. Nothing resembling interactive or realtime. &lt;/p&gt;
&lt;p&gt;And here I had this application that dealt with the same quality and quantity of data instantaneously. So I say hey, I can build an engine to go with your data. &lt;/p&gt;
&lt;p&gt;And I told him about a company, called Starry Night Pro, that was using some 3D effects but not actual image data from the sky. He wound up licensing his data to them, but the result they got was closed and self-contained.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: What kind of imagery was in it?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: What we'd now consider a low-to-medium resolution full-sky survey of the northern and southern hemisphere. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: When you say low-to-medium resolution, what could you see if you zoomed in on a galaxy?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: If you zoom into M51 in WorldWide Telescope, using the Hubble imagery, it'll be about 4000 pixels tall. And in their survey, it's about 4 pixels tall. You can barely make out that it's a spiral galaxy. &lt;/p&gt;
&lt;p&gt;We have the entire sky at one arc-second per pixel, and for objects like M51, thousands of pixels tall. And of course every time you go twice the resolution, it's four times the data.&lt;/p&gt;
&lt;p&gt;They wanted to fit everything on a CD-ROM. For us, we're talking about terabytes, it's not something you distribute. I thought you should install a small application, and the data comes over the network.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: And that's how WorldWide Telescope does it?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: Right. Everything except the thumbnails comes over the Net. We use the thumbnails to get the wordwheel functionality with search.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: The data file's about 3 megabytes?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: There's about 12 megabytes of thumbnails, but yes, the catalog is about 3 megabytes.&lt;/p&gt;
&lt;p&gt;So, I had this vision for a product, but the economics were wrong to do it as commercial software in the astronomy market. Plus, they'd want to do something aimed entirely at high-end amateurs, not at professional astronomers, or at the general public who are the outreach targets for professional astronomers.&lt;/p&gt;
&lt;p&gt;And then Curtis and I got together. I envied his position in research, being able to explore new things that hadn't been done before.&lt;/p&gt;
&lt;p&gt;It turned out that Curtis had been exploring how to create an educational environment with rich tools for exploring space, and he'd been collaborating with Jim Gray on TerraServer, and now he was looking for the technology to make it possible. &lt;/p&gt;
&lt;p&gt;Here I had this technology, and was looking for somebody who was enthusiastic about having a purpose for it. So it was the peanut butter and chocolate moment. Curtis passionate from the education side, me from the technology side, happening to be in the right company at the right time.&lt;/p&gt;
&lt;p&gt;So I made a demo using with the Sloan Digital Sky data, and Jim went crazy over it. This was the visualization aspect he'd been looking for. It was the front end that makes the data consumable.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Tell us about the WWT's back end, and how it relates to what Jim's team built.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: To get the data out of the silos, Jim was involved in the National Virtual Observatory and the International Virtual Observatory Alliance. If you know how to talk these VO standards, you can exchange data, and you can do queries against other people's data.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So on the one hand, these standards enable you to combine data sets that you fully assimilate. But on the other hand, they enable federated query.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: Right. A lot of the astronomers were dealing with data extracted from catalogs. You took image data, and then you got the numerical analysis out of it, and stuck that in the database. The transfer of images wasn't really their domain for this round, they wanted to do the stuff you could put into SQL Server. &lt;/p&gt;
&lt;p&gt;So while TerraServer put earth image data into SQL Server, the sky image data was lagging behind. But you could query from a source on the Internet, and then join it to some other data coming from another source. Sometimes it required the data to marshall from one machine to another for efficiency, but essentially it meant you didn't have to translate everything into your database.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: But I assume that federated query isn't happening in WorldWide Telescope. We're not waiting for requests to go across the network, you've combined the datasets for your purposes.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: There are common sets of data that you'll need all the time. It's a relatively small amount, and we download that to your client. The thumbnails, the catalog.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: And what's in the catalog?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: The Messier objects, the NGC objects, the list of solar system objects, &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: And coordinates for them...&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: Yes, and magnitudes, and classifications. For the 10,000 brightest stars. Probably 30,000 objects in all. We'll make that live on your machine so you can zip around in the sky, look at stuff, and say, hey, what's that?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Which is what every planetarium program does, right?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: Yes, but that's generally where they stop. They go a bit beyond, by having a bigger download. We do it in 20 megabytes, they may have 250, or a gigabyte, but that's all you'll ever get.&lt;/p&gt;
&lt;p&gt;In our case, when you start up and your client contacts the WorldWide Telescope, we give you metadata saying what sources are available: the Hubble collection, the Spitzer collection. The metadata tells you where to go get the imagery. Some of it we'll host in Microsoft's data center, for scale reasons, and to ensure that it's available. But this data can be anywhere: Space Telescope, JPL...&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So I'm looking at the list. Which of these many sources are you hosting?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: We're hosting a lot of the data we launched with. Partly because we don't yet have a &lt;a href="http://technology.arc.nasa.gov/partnering/spaceact.cfm"&gt;space act agreement&lt;/a&gt; with NASA. Even though we've collaborated with a lot of people who are NASA-funded, they're not allowed to acknowledge that collaboration or put anything into a legal document until we have that agreement done. While there are some people we could have just pointed to as data sources, it'd be in violation of internal NASA policies. So we're hosting more than was strictly necessary for the initial release.&lt;/p&gt;
&lt;p&gt;But the concept is that you can plug in other sources that we're not even aware of. You just load metadata references into your client, by going to a website for that community or organization, and then you have access to terabytes of their data.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: The standards talk about how to represent objects and their metadata. Do they also talk about how you query a source, since they're all going to be huge? What's the query protocol?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: At WorldWide Telescope we understand what's called &lt;a href="http://www.ivoa.net/Documents/latest/VOT.html"&gt;VOTables&lt;/a&gt;. There are standard ways to create queries, and standard ways to get results. &lt;/p&gt;
&lt;p&gt;There are two ways that can happen. One is that our servers can do the queries, consolidate and cache the results, and we regurgitate the data as needed to our clients. So we do a VO SIA (simple image access) query to Hubble occasionally. When they have new images, we download these 500 megabyte or gigabyte images, which would be a very big download for a client, and we chop them up and create a tiled multi-resolution pyramid that we store on our server. The raw consumer wouldn't have have been able to use that data, but by putting our value-add into the pipeline -- Hubble took the image, Space Telescope processed it and put it up on a web service, we do another step of processing to make it visualization-friendly -- now lots of people can see a thumbnail, click on it, it zoom in, and the instant that they click and zoom they're already seeing the image. And as they zoom in further, they see all the gorgeous detail, but they don't have to download all the data.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Is this engine related to the Deep Zoom technology?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: We predate Deep Zoom. It has some similarities, but the difference is that Deep Zoom and Seadragon are 2D technologies that use the graphics engine for doing tiled multi-resolution images. We actually have to align all our images in 3D space because from the earth, space looks like a big sphere at almost infinite distance, but there is a curvature to it.&lt;/p&gt;
&lt;p&gt;Imagine taking a round room, and trying to put a bunch of bathroom tiles on it, and grout it. The tiles seem to come together and have parallel lines for a while, but eventually it stops working well. Maybe you can take one line around the equator, but as you go up you have fewer tiles, and weird-shaped tiles, and nothing lines up.&lt;/p&gt;
&lt;p&gt;That's the problem we have. We're looking at spherical data, so we had to come up with a new spherical transform that preserves the poles. In previous projects, like Virtual Earth or TerraServer or Google Earth, the poles weren't important, because nobody lives there and nobody needs map directions for driving around there.&lt;/p&gt;
&lt;p&gt;As far as the earth is concerned, you can cut off everything above and below a certain latitude and nobody would care. But you can't treat the sky like that. And you can't treat the moon or other planets that way either.&lt;/p&gt;
&lt;p&gt;So we had to come up with something called TOAST: tesselated octahedral adaptive subdivision transform. It creates a 360-degree wraparound view that's either a planet surface or the infinite sphere of the sky, and lets you represent it using a 3D graphics accelerator, very rapidly and efficiently. So we can have an image pyramid the way Deep Zoom does, and TerraServer before it, but we don't have to give up the poles. &lt;/p&gt;
&lt;p&gt;That was something that didn't exist. There was Mercator projection, which is how you're used to seeing the earth mapped onto a flat piece of paper. It's hard, you have to do weird math to make it work at all. Then there's equirectangular projection. But there was nothing that could deal with storing an image in a spherical projection. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So there are multiple full-sky surveys that you can switch between. So for example you can be looking at the Milky Way in the standard view, then switch over to infrared view and see it as an incandescent band.&lt;/p&gt;
&lt;p&gt;Is it the VO standards that enable you to weave those views together in a coherent way?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: No, that's where TOAST comes in. What astronomers did before is that, because there was no way to visualize the full sky data, they would store all their images as a bunch of individual...&lt;/p&gt;
&lt;p&gt;...OK, you have a sphere in the sky. You put a camera on it and take a picture. What shows up on the film is what's called a tangential projection. &lt;/p&gt;
&lt;p&gt;Imagine taking a beach ball with all the stars plotted on it, and putting a light in the middle, and putting the beach ball up against a wall touching at one point. The stars will shine out and hit that wall. All of these beams are projecting from the middle, to where they lie on the sphere's surface, to where they hit on the wall. It's a way of taking something round and making it flat.&lt;/p&gt;
&lt;p&gt;As long as you're looking at a very small part of the sky, there isn't very much distortion. But when you start looking at a large part of the sky the distortion becomes huge. &lt;/p&gt;
&lt;p&gt;What astronomers did was put these tangential projections into databases, and they even knew how to mosaic them to make bigger chunks. But when it came to anything larger, it broke down. If they made really big mosaics, they had to use projections that couldn't represent the poles, and everything would get more distorted the farther it got from the equator.&lt;/p&gt;
&lt;p&gt;So now we have services like NASA SkyView. NASA has over 50 full-sky surveys sitting on servers, and while they participate in the Virtual Observatory, the images themselves are using a private well-dcumented standard. So we gave them code for TOAST.&lt;/p&gt;
&lt;p&gt;It used to be that when people made a request for a wide area of the sky, they would return multiple images joined into a mosaic. But now we said, we could ask for just a single tile, at a given level of resolution -- one tile that was the whole sky, or one tile that was a tiny piece of the sky -- but everything was laid out in a very specific grid for our projection.&lt;/p&gt;
&lt;p&gt;While their software couldn't do it very quickly, it allowed us to go through and get all the tiles from their servers, for all these different studies, and put them up on our high-capacity servers.&lt;/p&gt;
&lt;p&gt;So there's an automated path to get from a bunch of individual pictures of the sky to this full-sky mosaic that can be seen seamlessly. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So where's the TOAST transform being applied?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: Right now it's being applied, for that data, on their servers.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So you gave them the algorithm, and they're running it for you?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: That's correct. And eventually they'll be able to host the data when they have the capacity, so you could point a WorldWide Telescope client there. And even today theoretically you could do it.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: They keep the sources as they acquired them, but make the output of this transform available to queries?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: They generate the transform on the fly for each query. If they added a cache and then kept it warm, it would be acceptable for interactive use.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: When you look at the source list in WorldWide Telescope, those are the surveys you're talking about?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: Yeah, ROSAT and WMAP and things like that. Those are the full-sky surveys. So for the first time ever, we've assembled a view of the sky where you can look at everything from radio wave all the way to gamma. All the way from the longest-wavelength lowest-energy part of electromagnetic spectrum to super-high-energetic particles.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: It's completely amazing, and it's wild to be able to cross-fade between them and compare the differences.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: We put together a standard for how you can visualize a spherical data set, we've given people the ability to create this data, and we've provided a client that knows how to accept this astronomy data -- both the spherical data and the original tangential images.&lt;/p&gt;
&lt;p&gt;So when you have a study from Hubble, they can use the original tangential images the way they came off the camera, and in WorldWide Telescope we figure out the math and do the 3D transforms so that when we align that to the TOAST background from another full-sky survey, all the stars are exactly where they should be and everything lines up.&lt;/p&gt;
&lt;p&gt;And because we have the universal coordinate system -- right ascension and declination -- we can put things in the right place in the sky. When you cross-fade you may be looking at apples and oranges, but you're looking at them on the same tree.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Is this going to be a public standard? Can other clients use your services, or other services that support it?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: We've offered the algorithms and code to other organizations, like JPL, and we've even told Google that if they're interested in reworking their all-sky surveys to work with this format, we'd help. But they've got such critical mass around their current projections that they don't think they can take that on anytime soon.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: There's been some pushback, as you know, about WorldWide Telescope being a Windows-only product. But the project is much broader.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: Yes. And part of it is that all the data we support in WorldWide Telescope, and the WTML language we use...&lt;/p&gt;
&lt;p&gt;...when people ask me how WorldWide Telescope differs from an astronomy program like Starry Night, I say that it's like a browser, like Internet Explorer or Safari or Firefox, but it's a browser of data in formats that are astronomy-friendly, like VOTables and WTML. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Now WTML isn't the XML syntax you see when you save a tour and look into the file?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: Right. That, we're not even documenting. That's the tour XML format. But if you look in your user folder, or add objects to your collections and look in your documents folder, you'll see WTML there. It describes objects, hierarchies, network links, images.&lt;/p&gt;
&lt;p&gt;A tour in WTML is metadata that says, this is the tour, what categories it's related to, what objects it visits. &lt;/p&gt;
&lt;p&gt;We can also have things that say, there's an article in Sky and Telescope about M51, and it has that object's location in the sky. When you join the Sky and Telescope community in WWT, and you're browsing around and you find M51, you can look down in the context search and see the article, and open it up. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: That'll depend on which communities I belong to?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: Yes. We always show you the WorldWide Telescope stuff. Then when you log in we show you the union of that and stuff for the community you're currently looking at. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: OK, very helpful. Now let's go back to your discussion of projection, and see how it relates to my experience last night. I found the Milky Way, and I wanted to pan west, but it seemed like things wanted to spin around.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: There's two ways to look at the sky. First, looking at the full spherical view as if the earth didn't exist. You're earth-centered, but the horizon isn't blocking your view. North is up, south is down, and unless you specifically spin your view, when you move, north will always stay north.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: That's the view without the horizon.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: Right. With the horizon, the zenith always stays looking up, and as you move around, if you're looking at the zenith, it will always stay at the top. It can never go below the midpoint of the screen. &lt;/p&gt;
&lt;p&gt;On a space station where there is no up or down, you'd think you could design anything and people could just float around in 3D space, there'd be no preferred direction. But the reality is that humans get extremely confused. Your brain has a natural desire to have an up and down and left and right, and when you invert those, you don't process things.&lt;/p&gt;
&lt;p&gt;So if you were in the View From Here mode, the zenith always stays up. If you're in the other mode, looking at full universe, and you went to the north pole and tried to move beyond, you'd only be able to spin. You would not be able to pull the north pole beyond the middle of your screen, because that's your viewpoint. So then south would start becoming up, and left would be right, and you'd be spinning in the hamster ball.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So if I want to look at the Milky Way, and then swing left to locate the Pleiades..&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: To simulate looking at the sky, go to View and select the location where you are, and say View From Here. Then it will show you a horizon, and north/south/east/west, and north is straight up. Then it will simulate your eyes. If you're standing up and you look at the horizon, then you look up and up, what happens? When you're looking up, your head is tilted all the way back, touching your back, and you can't tilt any more. To see any further back you'd have to fall over.&lt;/p&gt;
&lt;p&gt;So then what do you do? You rotate yourself and look south. That's how your head works, and that's how a telescope with an alt-az [altitude/azimuth] mount works. &lt;/p&gt;
&lt;p&gt;We're trying to put on constraints so people don't get lost and upside down and backwards. But unfortunately it's hard to explain what happens when you get to the poles. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Do you provide an unconstrained view?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: We do not. We cannot simulate an unconstrained view. The only thing we do allow is that, once you're viewing something, you can rotate the camera's view by hitting Control and then dragging left and right.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Ah. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: It's possible that's what was happening to you. We have a Reset Camera if you want to go back to neutral.&lt;/p&gt;
&lt;p&gt;The reason for this feature is that when you're making a tour, you might need to orient your view. M51 goes up and down but your screen goes left and right. If you want to zoom in and frame it, you need to rotate your camera like you would a real camera. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: OK, that may have been the confusion.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: When you get in that mode, we try to make north-south-east-west make sense based on that, but it will do strange things at the poles. We still try to keep north, minus your rotation, up. But that mode is a little strange. We give that feature so people making tours can frame things better, but it's not something we try to document or recommend that people use for normal browsing.&lt;/p&gt;
&lt;p&gt;So, if you care about your position on earth, use View From Here. If you want to ignore your position on earth, use the default mode. Then we don't care where you are, we're going to show you the whole sky, and date and location are ignored, it's just the sky, immutable and unmoving. Well, the planets move around on it.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: We'll never get to the bottom of all this, but I think you've given us a good sense of what I was really looking for, which was: What's actually been accomplished here? In terms of taking this raw astronomy data and correlating it in a way that's not just consumable in terms of quantities of data transmitted over the network, but in terms of making sense of objects and relationships.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: The vision of getting everybody access to all this astronomy data required systematic changes at every single level. We built on some things that Jim pioneered with NVO, and worked from there, but it was very systematic. How people process the data. The client to access the data. The protocols over the wire. Educating people, providing the context for it. &lt;/p&gt;
&lt;p&gt;We put a lot of things together, but we also created a systematic model for how to do everything end to end, top to bottom, left to right. Now there may be other people who use the pieces that we've created, and then change them to use different data sources, different visualizations. Say someone creates a Mac client, or an iPhone client, that's possible. Or a mobile phone version of it, or a web-based version. Over time we or others can replace various components, but as a reference model for solving all the problems in order to get the data into people's homes and into their eyeballs -- you had to solve for all of those problems, otherwise people are still blocked from being able to really explore.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Will this end-to-end pipeline be documented?&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: Things like TOAST, and WTML, and our communities interface will be documented. There will be documentation, tools, and code coming out over the summer to help people understand more. As for some of the protocols, we'll need to do some work to make sure they're ready for us to recommend as standards.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Excellent. Well, thanks Jonathan!&lt;/p&gt;
&lt;p&gt;&lt;b&gt;JF&lt;/b&gt;: OK, thank you.&lt;/p&gt;
.&lt;img src="http://channel9.msdn.com/489757/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/How-the-WorldWide-Telescope-works/</comments><link>http://channel9.msdn.com/posts/JonUdell/How-the-WorldWide-Telescope-works/</link><pubDate>Mon, 14 Jul 2008 09:25:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/fay/fay.wma</guid><evnet:views>968</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489757/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>Jonathan Fay is principal developer of the WorldWide Telescope. In this interview he explains how the project has yielded not only a breakthrough software product, but also a reference model for the acquisition, transformation, and visualization of astronomical data. You'll learn not only how the WorldWide Telescope works, but also why it exists: To fulfill the education mission discussed in a related &lt;a href="http://perspectives.on10.net/blogs/jonudell/The-story-of-the-WorldWide-Telescope/"&gt;interview with Curtis Wong and Roy Gould&lt;/a&gt;.</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/fay/fay.mp3" expression="full" duration="4680" fileSize="37175808" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/fay/fay.wma" expression="full" duration="4680" fileSize="37604239" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/fay/fay.wma" length="37604239" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/How-the-WorldWide-Telescope-works/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489757/Trackback.aspx</trackback:ping><category>astronomy</category><category>worldwide telescope</category></item><item><title>The story of the WorldWide Telescope</title><description>&lt;p&gt;
The &lt;a href="http://worldwidetelescope.org"&gt;WorldWide Telescope&lt;/a&gt; was first shown to the public &lt;a href="http://www.ted.com/index.php/talks/view/id/224"&gt;at TED 2008&lt;/a&gt;, in a joint presentation by project leader Curtis Wong, manager of Next Media Research for Microsoft, and Roy Gould, a science educator with the Harvard-Smithsonian Center for Astrophysics. In this interview they discuss how -- and why -- the WorldWide Telescope combines many sources of astronomical data and imagery to create a seamless view of the night sky.
&lt;/p&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;
            &lt;p&gt;
            &lt;img src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/wong-gould/curtis-wong.jpg" /&gt;
            &lt;/p&gt;
            &lt;p&gt;
            &lt;strong&gt;Curtis Wong&lt;/strong&gt;
            &lt;/p&gt;
            &lt;p&gt;
            &lt;/p&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;
            &lt;p&gt;
            &lt;img src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/wong-gould/roy-gould.jpg" /&gt;
            &lt;/p&gt;
            &lt;p&gt;
            &lt;strong&gt;Roy Gould&lt;/strong&gt;
            &lt;/p&gt;
            &lt;p&gt;
            &lt;/p&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: I was interested in astronomy as a kid, but when you grow up in Los Angeles, the odds of seeing the Milky Way are pretty slim. I think the only time it happened recently was during the quake when the whole city lost power.
&lt;/p&gt;
&lt;p&gt;
It wasn't until much later that I actually got to see the Milky Way, and other objects I'd seen pictures of, and it was really quite a transformative experience. I always wanted to figure out how to recreate and share that experience.
&lt;/p&gt;
&lt;p&gt;
Early on, in the 80s, I made a little HyperCard stack called MacTelescope, which was my attempt to create the experience of looking at the sky, and then -- if you could manage to see the Milky Way -- to zoom into a section where there are all these interesting globular clusters and nebulae, if you know where to look.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So there was already the idea of taking people on a guided tour.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: Exactly. Later I moved from the Voyager company to a company called Continuum, a little think-tank organization started by Bill Gates. The company was thinking about authoring tools and media. The project I wanted to do was called John Dobson's Universe.
&lt;/p&gt;
&lt;p&gt;
John Dobson was a physical chemist at UC Berkeley who got drafted to work on the Manhattan Project. He was there to do the chemistry for tuballoy, which was the code word for uranium. He became a conscientious objector, left the project -- which was pretty hard to do -- and became a Vedantan monk. Then he became interested in looking at the sky but, being a monk, he didn't have any money for a telescope.
&lt;/p&gt;
&lt;p&gt;
He knew that San Francisco shipyards were sources of glass discs, but they were too thin. Conventional wisdom said that you need thick glass to be able to grind a mirror. But he defied wisdom and found a way to use round porthole glass. He also came up with an ingenious way of mounting the telescope, using cardboard concrete form tubes. His design is now one of the most common designs for telescope mounts in the world.
&lt;/p&gt;
&lt;p&gt;
He also created an organization called San Francisco Sidewalk Astronomers, where people who have telescopes are encouraged to take them out into the public and show people what's up in the sky, and explain what's going on.
&lt;/p&gt;
&lt;p&gt;
John spent a lifetime in national parks, and in San Francisco, talking about astronomy to the general public. He was really good at taking complex ideas and conveying them to the public.
&lt;/p&gt;
&lt;p&gt;
I remember once when he was showing the Andromeda galaxy, there was a picture with the galaxy in the background which looks like a kind of fog, with a lot of stars in front, and he said: "The stars you're looking at in front are like raindrops on your window, looking at a distant cloud."
&lt;/p&gt;
&lt;p&gt;
So anyway, we started that project, got about halfway, then other things happened and it got cancelled.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: It's a great story!
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: So I've thought about astronomy for a long time. At Microsoft, I heard a talk by Jim Gray who was applying his database expertise to astronomy.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: I looked up a &lt;a href="http://research.microsoft.com/research/pubs/view.aspx?msr_tr_id=MSR-TR-2002-75"&gt;paper&lt;/a&gt; that Jim published with a guy from Johns Hopkins, Alex Szalay, and it's called &lt;em&gt;The World-Wide Telescope&lt;/em&gt;, but interestingly, the title also includes the phrase: &lt;em&gt;An Archetype for Online Science&lt;/em&gt;.
&lt;/p&gt;
&lt;p&gt;
The idea is that, not just in astronomy but in all of science, we're getting to the point where there's less direct observation, and more collection and analysis of data on a really large scale, happening in ways that are computationally assisted, and also assisted by the collaborative properties of the Internet.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: Exactly. We're getting this deluge of data. The challenge then becomes how you process, how you utilize. Bringing computer technologies -- SQL, visualization -- can really help.
&lt;/p&gt;
&lt;p&gt;
So Jim had written that paper with Alex in 2002, and he'd given a talk on some of the work he'd been doing with the Sloan Digital Sky Survey at that time.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Yeah, I've listened to that talk, and I was hoping you could help me connect the dots between the work that was done there and the federated virtual observatory which, for him, became a case study in the use of XML web services to create an Internet telescope that was a federation of radio astronomy services.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: Right. Alex and several other research scientists came together to create the &lt;a href="http://www.us-vo.org/"&gt;National Virtual Observatory&lt;/a&gt;, which establishes common protocols for accessing astronomical data and imagery.
&lt;/p&gt;
&lt;p&gt;
When Jim told me they were starting that project, I told him I wanted to help. Part of the pitch was a PowerPoint I made about SkyServer that showed how you could embed that data and imagery in a virtual environment.
&lt;/p&gt;
&lt;p&gt;
What they were thinking was that astronomers would be querying the database, and pulling out objects. But I thought that to make this an interesting educational resource, we would need to build an environment in which people could create and share stories about the objects, and could connect those stories to original source information.
&lt;/p&gt;
&lt;p&gt;
Later he came back to me and said that the SkyServer data was released, and he wanted to redesign the website to make the data more accessible to the public. So I helped with that. SkyServer DR2 (data release 2) was the redesigned website. And I used that to make the case for building the learning environment that became WorldWide Telescope.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Under the covers at SkyServer, there was a lot of work done to correlate observations from different sources of data.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: Exactly. Alex Szalay did a lot of that work, in collaboration with people from other universities.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And that's the foundation for what we see now in WWT?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: That was sort of the first case. It was the Sloan Data, which is just the northern sky. Then at TechFest in 2006, we were working with Alex to take the SkyServer images -- he has an image server that'll give you an image from coordinates, and from that you can get data about other objects in the field of view. For our TechFest demo, our developer Jonathan Fay -- who had been fiddling around with building his own hierarchical multi-resolution tile-browsing engines -- put together an engine that would pull the images from SkyServer and assemble them into a mosaic that you could browse and zoom in and out of. That was the first manifestation.
&lt;/p&gt;
&lt;p&gt;
Meanwhile, we'd gotten some interest from Harvard. &lt;a href="http://cfa-www.harvard.edu/~agoodman/"&gt;Alyssa Goodman&lt;/a&gt; found an intern for us who was passionate about education, and she came to work with us in the fall of 2006.
&lt;/p&gt;
&lt;p&gt;
And then in January, as I was emailing back and forth with Jim about our plans, he disappeared after sailing out in San Francisco bay to spread his mother's ashes.
&lt;/p&gt;
&lt;p&gt;
So when TechFest came around again in 2007, we decided to rename it WorldWide Telescope in honor of Jim. We started building it in March. That summer was a big effort to secure image sets and data from lots of different sources, as well as building the engine and defining the authoring environment. We showed a rough prototype at the &lt;a href="http://www.astrosociety.org/"&gt;Astronomical Society of the Pacific&lt;/a&gt; meeting ... Roy, was that in Chicago?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RG&lt;/strong&gt;: Yes.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: Roy, do you want to pick up the story from there?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RG&lt;/strong&gt;: Sure. Let me rewind the tape on my end all the way back. I was smiling when Curtis told his childhood story about being interested in stars he couldn't see. I have the east coast version of that, looking up in New York City, hounding my parents for a telescope, and then when I finally got one, we went up on the roof and there was nothing to see.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Just a little too much ambient light!
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RG&lt;/strong&gt;: Exactly. We thought we saw a star, but it was a plane.
&lt;/p&gt;
&lt;p&gt;
My career has been in science education, and I was familiar with Curtis' work long before the WorldWide Telescope. When I heard his talk in Chicago, and at some subsequent conferences, I saw that it was addressing two of my passions. One is astronomy, what's out there in the night sky, and having a unified view of it. But also, as somebody who communicates science, I've always been interested in the learning interface. It's not just about using the resource, it's about learning from it.
&lt;/p&gt;
&lt;p&gt;
It was clear there were lots of things that could be done, but it took a long time to see what all of them might be. It was only after I used it for a while that I realized what a great breakthrough it is.
&lt;/p&gt;
&lt;p&gt;
First, there's the seamless experience of the night sky. It's true that all these images are accessible in principle, if you know where to find them, and of course many us in the field do that when we prepare curricula or museum exhibits or planetarium shows.
&lt;/p&gt;
&lt;p&gt;
But when you do that, you see the universe in a disconnected way. Once you go on WorldWide Telescope, it's a different experience. It's like you could look up with perfect vision.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And with complete contextual awareness.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RG&lt;/strong&gt;: Exactly. And I think we take the night sky for granted. We don't realize how important it used to be. When we have floods in the midwest, we call them disasters. Well, disaster literally means against the stars. Catastrophe is the Greek for against the stars. Romeo and Juliet were the star-crossed lovers. It's all through literature, it's part of common speech.
&lt;/p&gt;
&lt;p&gt;
Then you fast-forward to the modern day, and few of us have even seen the stars, let alone have that relationship to them. For me that's number one about WorldWide Telescope. It's really inviting us to take a long look at the night sky again.
&lt;/p&gt;
&lt;p&gt;
Of course you can do that through the WorldWide Telescope, but then you can also look at the night sky from your back yard, or from a dark location, and have taht dual relationship. So you can both see the night sky and, in WWT, you can explore it.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: That's one of my favorite things to do. At night, of course, laptops create their own illumination. My first astronomy program ran under MS-DOS, amazingly enough, and I'd take my laptop from that era out in the backyard and use it as a guide. This is the latest and greatest incarnation of that tradition.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RG&lt;/strong&gt;: And from the educational point of view, this pays enormous dividends. We've done research here about what students know when they graduate from high school about the night sky, and about the universe in general. From that, we know that the majority of high school students graduate placing the stars inside the orbit of Pluto.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: C'mon. Really?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RG&lt;/strong&gt;: It's true. About 52 percent, and that's based on surveys of thousands of students in 37 states.
&lt;/p&gt;
&lt;p&gt;
There are many reasons for that, but certainly one of them is this lack of good images of the sky. It's hard enough to see a picture of the solar system, let alone its context within the galaxy. That's one beautiful feature of WorldWide Telescope, you get a sense of where things are.
&lt;/p&gt;
&lt;p&gt;
We also know that students think galaxies are closer than stars, because they tell us that stars are just point sources, and no matter what your magnification or telescope they remain points, so they must be very far away. Whereas galaxies, whatever they are, are big, and so they must be closer.
&lt;/p&gt;
&lt;p&gt;
But if you go on WorldWide Telescope, and look at the Andromeda galaxy, the nearest big galaxy to us, and the furthest thing you can see with your naked eye, you get a physical feeling. You see all the stars in our Milky Way that are the veil of stars we look through, and you really get a sense that the Andromeda galaxy is vast and distant.
&lt;/p&gt;
&lt;p&gt;
Another thing that's useful is the ability to look at the universe in different wavelengths of light. We see only the visible light our eyes can see, but that's like listening to one instrument in the orchestra.
&lt;/p&gt;
&lt;p&gt;
You can download a Chandra image, an X-ray image of the sky, but what's that about? You can't really figure it out. Or one of the infrared images. In WorldWide Telescope, you can seamlessly crossfade back and forth between images taken at various wavelengths. So you see what's going on in the visible wavelengths, but you also see what's going on that's emitting these other forms of light. And that's important, because most of the action in the universe happens at wavelengths of light we can't see. The WorldWide Telescope automatically aligns these different images.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Can you say a bit about how that's done?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: Yes, it was a challenge to register all these different surveys so you can do that kind of cross-comparison. There are some emerging standards from the National Virtual Observatory and others. The &lt;a href="http://www.virtualastronomy.org/avm_metadata.php"&gt;AVM&lt;/a&gt; (Astronomy Visualization Metadata) standard provides metatags at high precision within objects, to give exact position and scaling and orientation for images in the sky.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Does that work by reprocessing existing survey data?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: Right. Generally the data exists. But for example, when Hubble makes these beautiful press-release images that they send out in color, the metadata is usually lost, because the images have been post-processed in Photoshop and other programs. So a lot of these images that we want to put out there for the public needed to have that metadata reintegrated. And sometimes they're composites of many Hubble images to create a large mosaic. So that composite image needs to have metadata put into it.
&lt;/p&gt;
&lt;p&gt;
I think a source image is about 400 megabytes. So while it's technically in the public domain, they don't release it because it's way too much data for most people to download. They've released low-resolution versions, but we have the full-resolution image of the Crab Nebula, and other things, in WorldWide Telescope, because we can enable people to use the high-resolution images without having to download all of them.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So part of it was going back to sources that were notionally available, but not practically available.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: Right.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: But part of it is about emerging standardization of how these images are described.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: Yes. The whole NVO group is working toward common standards and common protocols for image metadata, so they can all be used by everybody. There's AVM, part of the &lt;a href="http://www.virtualastronomy.org/project.php"&gt;VAMP&lt;/a&gt; project -- Virtual Astronomy Multimedia Project -- leading the charge for annotation of imagery and other media in the context of the sky.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And there are presumably many uses for those annotations.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: Yes. So we were one of the first guinea pigs for VAMP, and I think Google joined a bit later.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So let's talk about the authoring aspect. I'm a huge fan of multimedia and audiovisual tools for educational and training purpose, and this is a wonderful example of that.
&lt;/p&gt;
&lt;p&gt;
I was looking at what files get created when you author a slideshow in WorldWide Telescope, and it looks like the output is an XML file with coordinates and descriptions. To me that says two things.
&lt;/p&gt;
&lt;p&gt;
First, it says that WorldWide Telescope presentations can be created using other tools, which is interesting.
&lt;/p&gt;
&lt;p&gt;
It also says that presentations created inside WWT could potentially be played elsewhere, in other environments.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: Exactly. The whole idea with the authoring, and these guided experinces, is...let's go back to the educational intention. I spent a lot of years building instructional learning, where you bring in experts to tell you about subjects, in context, but also the self-directed discovery aspect of learning. Then there's a third part I wanted to bring in, constructive learning. There's always a duality between instructive and constructive learning.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: What do you mean by constructive?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: Basically, learning by doing. Where kids who don't know much about astronomy can take tours from experts, and be taken to unfamiliar places, but then pause those tours and explore on their own. In WorldWide Telescope you can always pause the tour, like stopping a tour bus and getting off to look around. At that point you can right-click to get more information, you can zoom into places that the tour didn't deeply explore because it had to keep moving, you can see other objects that are in the neighborhood.
&lt;/p&gt;
&lt;p&gt;
If you notice down below, in the context menu, as you get to different objects you can not only see that object in different wavelengths, but you can also see other tours that intersect with that object.
&lt;/p&gt;
&lt;p&gt;
The goal is that as you start to see more and more guided tours within this space, if you think of objects as nodes or intersections, there are more and more opportunities to cross over from one tour to another. Eventually we might see a kind of hypermedia web of learning where instead of hyperlinking among words, we're linking among stories and paths and ideas.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And that'll include things I make for myself. If I make a narrative about a part of the sky, then the context surrounding the area I'm interested in will be available when someone else plays back that tour.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: Exactly. If you take a tour about stellar evolution, and learn about how stars get formed from nebulae, and you get to the planetary nebula stage, you might say, wow, those are really pretty, what's going on there? Then you might intersect with a tour about different planetary nebulae, where you dive deep into that category. Then you might find a tour just about the Ring Nebula, or the Helix Nebula.
&lt;/p&gt;
&lt;p&gt;
Or conversely you may come across it in a different way. You're browsing the sky and you come across the Ring Nebula or the Helix Nebula, and you may say, what are these things? And you can see other things that relate categorically to them, which would then intersect with explanations of how planetary nebulae fit into the grand scheme of the origin of all the elements.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Roy, what tours are you working on?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RG&lt;/strong&gt;: We're working on two. One is a tour of black holes, in conjunction with a traveling exhibit the Smithsonian is producing.
&lt;/p&gt;
&lt;p&gt;
Another is a tour of alien solar systems and their exoplanets, as they're known. There are more than 300 stars known to have planets orbiting them. Using a small educational telescope -- we have a network of five of these, they're called &lt;a href="http://mo-www.cfa.harvard.edu/MicroObservatory/"&gt;micro-observatories&lt;/a&gt; -- students in middle school and high school can take their own images and study these exoplanets. They can actually characterize them in a surprising amount of detail. They can figure out how large they are, how far away they are from their stars,
&lt;/p&gt;
&lt;p&gt;
What's more, we're using the tour of exoplanets to get teachers who have never used the micro-observatory telescopes in this particular curriculum. You know, you can make a brochure, and send them some images of the night sky, but there's nothing more exciting than having a tour that gives you a sense of the context, the depth of the sky, where things are.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Do you think there will be a citizen science aspect to this?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RG&lt;/strong&gt;: Absolutely. I think that's going to be a major use. Astronomy is probably unique among all the sciences, in there are more amateurs than professionals. But this is a third category. You've got the amateurs, you've got the professionals, but now WorldWide Telescope makes possible the blossoming of citizen science. Especially given the flood of images we're going to have in the next few years, more than researchers can ever look at. You'll have images that have never been seen by humans, and that opens up a huge possibility.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So part of it's about getting more eyeballs on this flood of imagery. Potentially another part is citizen-driven analysis of data. I wonder if SkyQuery will become part of the suite, so people can start to ask questions, like how many fast-moving objects are in this part of the sky, which is one of the queries Jim Gray mentioned in his 2002 paper.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;RG&lt;/strong&gt;: Yes, I'd love to see that. There are several ways the public can contribute. One is to look for things, and make serendipitous discoveries. With some of the NASA solar missions, for example, where we have so many images of the sun coming back, you can see comets that were never seen before as they get close to the sun. And ordinary citizens have discovered comets that nobody knew existed.
&lt;/p&gt;
&lt;p&gt;
But to me the most important thing is what you just alluded to: Asking questions that nobody had thought to ask, even the professionals.
&lt;/p&gt;
&lt;p&gt;
So for example, what's the volume of a black hole? How big is it inside? If you go on the web and look at the standard references, you'll find answers all over the place, all at odds with one another.
&lt;/p&gt;
&lt;p&gt;
There are questions that researchers just haven't gotten around to asking, that many of the public will ask, and we don't know what those are yet.
&lt;/p&gt;
&lt;p&gt;
In a way, although there's all this wonderful technology in the WorldWide Telescope, but in a sense it's the modern incarnation of a campfire that you sit around and trade stories. Our organization has telescopes in Australia and Chile and elsewhere, and when I go to those countries, I find that the native cultures have all sat around campfires and developed incredible stories about the night sky.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So how does the collaboration work? I've made a little slideshow, it's stored as a file on my computer, how do I share that?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: We're trying to encourage the development of communities. Sky and Telescope is forming one, Astronomy magazine is forming one, Meade -- a telescope maker -- is forming one.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So the unit of sharing is the WWT file which gets created when you make a slideshow. It's a bundle of XML and images of thumbnails and maybe audio if there's voiceover. In a lot of cases, those will want to live out on the web where people can link to them. If I post that file, and somebody clicks on it, and WorldWide Telescope is installed, then it'll launch and play the slideshow when you click on the file?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: Yes. I was talking to a storyteller from a local Snoqualmish tribe, a lot of their stories happen to be about the sky. I wanted to try to capture some of those as examples of what you can do.
&lt;/p&gt;
&lt;p&gt;
By the way, if you're playing the tour, you can pause and go into edit mode, and it's all open. You can change destinations, you can drop in your own audio narration, music, text, and images.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So you've forked the thing you've downloaded, and at that point you can...
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: ... put your own interpretation on it, exactly. It's just like View Source. Except easier, because you don't know have to program in HTML.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Yeah, it's quite straightforward.
&lt;/p&gt;
&lt;p&gt;
Do you think that there's a need -- I suspect that there is -- for some sort of universal player that wouldn't require the full application, and wouldn't even require Windows, but would just be a way for anybody to play these things?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: Absolutely. That's a really good idea.
&lt;/p&gt;
&lt;p&gt;
So by the way, I want to highlight one citizen science story for you. It relates to &lt;a href="http://www.galaxyzoo.org/"&gt;Galaxy Zoo&lt;/a&gt;, a website that allows the public to help the Sloan digital sky survey catalog and tag the hundreds of millions of galaxies that were covered in the various data releases. In one case, a teacher from Amsterdam was looking at a galaxy and it looked really blue. She reported that to Galaxy Zoo, and they looked at it, and it was something they'd never seen before. So they retargeted the Very Large Array Radio Telescope to take a look at it. And based on those results, they've now secured Hubble time to study that galaxy in great detail.
&lt;/p&gt;
&lt;p&gt;
That's a case where putting the data out there, letting the public look at it, dividing up the sky, and having that feedback mechanism can really advance science. Because when you think about it, as we start to get telescopes like &lt;a href="http://www.lsst.org"&gt;LSST&lt;/a&gt; and PanSTARS and these other large telescopes that will generate many terabytes of imagery and data every night, it's going to be impossible for any one person or group to see what's up there. Image recognition's good, but nothing's as good as the human eye and human brain.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Some guidance on what to look for is really useful. If someone's found something interesting, and that justifies spending the resources to take a closer look, that's beautiful. That's exactly how things ought to work.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CW&lt;/strong&gt;: Right. And I think a lot of these telescope projects are thinking, how do we make this much data available to people? And how do we make that accessible in a simple way? So they've had conversations with us about how WorldWide Telescope might be able to help.
&lt;/p&gt;
&lt;p&gt;
Also, NASA is very interested in how realtime data feeds from them would enable the public -- at the same time -- to have access to mission data.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: You start to think about what's possible, and you quickly realize there's an infinite number of interesting possibilities. It'll be great to see how this plays out over the next few years.  Thanks!
&lt;/p&gt;&lt;img src="http://channel9.msdn.com/489756/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/The-story-of-the-WorldWide-Telescope/</comments><link>http://channel9.msdn.com/posts/JonUdell/The-story-of-the-WorldWide-Telescope/</link><pubDate>Fri, 20 Jun 2008 12:18:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/wong-gould/wong-gould.wma</guid><evnet:views>876</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489756/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>The &lt;a href="http://worldwidetelescope.org"&gt;WorldWide Telescope&lt;/a&gt; was first shown to the public &lt;a href="http://www.ted.com/index.php/talks/view/id/224"&gt;at TED 2008&lt;/a&gt;, in a joint presentation by project leader Curtis Wong, manager of Next Media Research for Microsoft, and Roy Gould, a science educator with the Harvard-Smithsonian Center for Astrophysics. In this interview they discuss how -- and why -- the WorldWide Telescope combines many sources of astronomical data and imagery to create a seamless view of the night sky.</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/wong-gould/wong-gould.mp3" expression="full" duration="2760" fileSize="22079040" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/wong-gould/wong-gould.wma" expression="full" duration="2760" fileSize="22335929" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/wong-gould/wong-gould.wma" length="22335929" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/The-story-of-the-WorldWide-Telescope/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489756/Trackback.aspx</trackback:ping><category>astronomy</category><category>worldwide telescope</category></item><item><title>Making sense of electronic health records</title><description>&lt;br /&gt;
&lt;p&gt;
My guest for this week's Perspectives show is &lt;a href="http://www.dbmi.columbia.edu/~hripcsa/"&gt;George Hripcsak&lt;/a&gt;, professor of biomedical informatics at Columbia and one of six researchers &lt;a href="http://www.microsoft.com/presspass/press/2008/apr08/04-17GWASPR.mspx"&gt;recently funded&lt;/a&gt; by Microsoft Research through its Computational Challenges of Genome Wide Association Studies (GWAS) program.
&lt;/p&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;
            &lt;p&gt;
            &lt;img src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hripcsak/hripcsak.jpg" /&gt;
            &lt;/p&gt;
            &lt;p&gt;
            &lt;strong&gt;George Hripcsak&lt;/strong&gt;
            &lt;/p&gt;
            &lt;p&gt;
            &lt;/p&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: For starters, what is a genome wide association study?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GH&lt;/strong&gt;: A genome wide association study involves scanning markers across the human genome to find genetic variations associated with certain diseases.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Specifically what's being looked for is single-nucleotide markers, right? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GH&lt;/strong&gt;: Yes. Now our role in this project is the phenotype. We're trying to address the phenotypic computational challenge. Often it's simple. Someone has diabetes or doesn't. Or two people have it, but one gets complications and the other doesn't.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So by phenotype you mean the expression of diabetes, in this case?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GH&lt;/strong&gt;: Yes. Often you start with a disease, and some number of patients, often very small, but up to several thousand, plus a control group of patients without the disease.
&lt;/p&gt;
&lt;p&gt;
You study the entire genotype, and you look for which sites on the genoome are associated with that disease. Then you look into that site. Now the fact that you may find a certain genetic mutation at that site -- that's not necessarily the cause of the difference between the two sets of patients. The cause may be something near that marker on the genome. So you might sequence that area, looking for other information about what genes are in the area, and so on.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So the computational challenge is one of correlation.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GH&lt;/strong&gt;: The first step is correlation. But...I'm at the zeroth step. There are other people working on the part I'm describing here. First they'll come up with associations, which is a computational challenge in its own way, because there is a vast number -- a hundred thousand, someday a million -- markers that you're looking at, to see if they're associated with this trait, diabetes or not diabetes.
&lt;/p&gt;
&lt;p&gt;
Then they need to figure out what proteins are coded at the marker, or near the marker. In order to get to that point you need the phenotype too. As long as it's something simple, like patients with or without diabetes, it may seem like that's the easy part of the experiment.
&lt;/p&gt;
&lt;p&gt;
But as time goes on, and genotyping gets easier and cheaper -- and as we learn to handle patient privacy, that's the other thing that limits the study, we have to be careful about how we collect and store these data -- the hard part is going to be collecting the phenotype.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: When you say "collecting the phenotype" -- that's clinical observation and description?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GH&lt;/strong&gt;: Exactly. Imagine that every patient who comes into the hospital is given the option to participate in a trial where, in a secure fashion, their genotype is done, and then their information can be used to discover new things about disease. Some number of patients would agree to that, and then all you have to do is take their blood samples, check the DNA, and genotype it. It's relatively straightforward if you have the money to do it.
&lt;/p&gt;
&lt;p&gt;
Then you have to find out what the phenotype of the patient is. But what questions should you ask? We don't know what diseases we might be studying, or what we might discover. We want to know the whole medical course of this patient: When they've been well, and when they've been sick.
&lt;/p&gt;
&lt;p&gt;
What we have for these patients are their electronic health records. And in the future, with Microsoft HealthVault for example, we have the personal health records. And so the question is, with the patient's permission, can we use those data to come up with a reliable phenotype?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: OK. Now I see how this ties into your career history. You've done a lot of work in the area of mining clinical records, using a variety of techniques.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GH&lt;/strong&gt;: Exactly. So in addition to working on the statistical models to do the genome association, we thought it'd be worth investing in the phenotype part of the problem. We've been working on it for 20 years, and it's harder than most people think.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: I wouldn't think it'd be easy, but tell us: What are the challenges unique to mining health records and clinical data?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GH&lt;/strong&gt;: Two of our collaborators are Rich Smiley and Pamela Flood, both faculty members at Columbia, in anesthesiology. They're studying a specific protein, the beta-2 adrenergic receptor, and they're sampling about 2000 patients. They're just studying two snips, it's not a genome wide association, but they need to collect the phenotype on those 2000 patients. It's prohibitive to have a research nurse accurately gather all the information they need to do their study -- and what they're studying is labor, the length of time you spend in pre-term labor and how much pain you experience, and how that's associated with variations on these two sites. It's an enormous amount of work, and it might not be reliable.
&lt;/p&gt;
&lt;p&gt;
But we do have an electronic health record. For each of these patients, the nurse has painstakingly documented what he or she recorded on the patient. Plus we have monitors, and lab tests, all fed into the electronic health record.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: But much of what's there is anecdotal, textual, and narrative, right?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GH&lt;/strong&gt;: Well, it's a mixture of structured and narrative. So ideally we should be able to generate the phenotype rather than have a person do it. We're trying to do that by computational analysis of the health record.
&lt;/p&gt;
&lt;p&gt;
Of course the health care record is intended for patient care, not for research. Anytime you take an information source intended for one purpose and try to use it for another you have to be careful. It takes a lot of processing and interpretation.
&lt;/p&gt;
&lt;p&gt;
Whether it's structured or narrative data, people use different words to encode things. In a cardiovascular study, is it "chest pain"? Is it "angina"? Is it "coronary artery disease"?
&lt;/p&gt;
&lt;p&gt;
The terminology varies, and often the terms are ambiguous. Someone says a person has diabetes, they probably mean diabetes mellitus, a problem with glucose. But there's a diabetes insipidus which is a completely different disease. All they have in common is that you urinate a lot.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So on the one hand you can try to provide a more structured data collection environment that's aware of these distinctions. And on the other hand you can do a lot of text mining, correlation, and natural language processing.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GH&lt;/strong&gt;: Yes. But remember, the people who are collecting the data are not interested in your research study. As a nation, we're working on improving our terminology, whether you get there by natural language processing or by having the doctor fill out a template. Either way we want to end up with computable knowledge.
&lt;/p&gt;
&lt;p&gt;
But when the purpose is clinical care, not research, we're always going to wind up with these problems.
&lt;/p&gt;
&lt;p&gt;
Furthermore, there's a reason why we speak in narrative style, and not in templates. It's an efficient means of communication. It may be true that it's best for health care providers -- and for all other human beings -- to speak in narrative language, and to have our systems, as they improve, turn that narrative into something structured.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Using natural language processing to extract structure from narrative is something you've been doing for a long time. What can you say about the progress of the state of that art?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GH&lt;/strong&gt;: We did a study in 1995 where we had 200 chest x-ray reports, and we had 12 physicians review them. Six were radiologists, the ones who generate the reports, and six were internists, who generally use them. They weren't looking at the images, they were just looking at the reports dictated by the radiologist who did the initial reading.
&lt;/p&gt;
&lt;p&gt;
We wanted to see if we could use natural language processing to say yes or no to a set of questions, like: Is this a report indicative of bacterial pneumonia, or of cancer, or of chronic obstructive pulmonary disease? We had six conditions we were looking for, and we compared the reliability of each doctor's reading to the other eleven, and we compared the computer system's interpretation to all twelve. We found that the computer system was about as accurate as the 12 experts.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And what is that system? There are general NLP frameworks, and also domain-specific ones...
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GH&lt;/strong&gt;: This one is a medical system called &lt;a href="http://lucid.cpmc.columbia.edu/medlee/"&gt;MedLEE&lt;/a&gt;, which Carol Friedman started building back in 1990 or 91. It went into production use in our hospital in 1995, and we've been using it ever since.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And you've been training it as you use it?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GH&lt;/strong&gt;: Well, improving it. It's not a data-driven system. So as it makes mistakes, we fix it, but it's not a machine learning system.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: It's a language understanding system.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GH&lt;/strong&gt;: Yes. It uses a semantic grammar, it divides all words into classes, so rather than getting into the details of syntax, like noun phrase and verb, it says, this thing is a body part, this is a disease, this is a procedure, this is a medication, and then it has a grammar that has sequences of these classes. It also has some syntactic parsing to figure out negation and things like that, so it's a blended approach. It was used initially for radiology reports, but now it's used for all of medicine.
&lt;/p&gt;
&lt;p&gt;
It was as accurate as humans at answering simple questions. When you get to complex interpretations, it doesn't do as well, but you're still in a situation where a human can't be expected to read a million chest x-ray reports, or discharge summaries. If we can do things that there just isn't the money for people to do, even if the accuracy is a bit lower, that's still useful.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Given that context, how will this apply to the funded project?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GH&lt;/strong&gt;: So, I've outlined some challenges. Things are narrative, terminology varies. Another is that data are sometimes wrong. Mistakes can be made in recording information on the chart, and often those are mistakes that the doctor would notice and immediately discount. Or it may be a subtle mistake that isn't important to a human interpreting the case, but could matter for a research trial where you're trying to automatically understand what's in the chart.
&lt;/p&gt;
&lt;p&gt;
Often, there's also missing data. The patient may go for care elsewhere. Or a data value may not have been recorded. Or a test may not have been done. So you don't really have a complete record. If you're doing a clinical trial, you have a lot of money to pay a lot of people to spend a lot of time tracking patients, following up with them, measuring everything that needs to be measured. But if you're just using the combination of electronic health record and personal health record, you have to rely on whatever was collected for that purpose.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: It's going to be sparse data, for the foreseeable future.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GH&lt;/strong&gt;: Exactly. So our challenge is to generate a reliable phenotype from that electronic health record and personal health record. Or, if it's not reliable, to know that there's not enough information in those records to make a determination.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So part of the challenge is to infer what's missing. How can you do that?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GE&lt;/strong&gt;: Let's say you're trying to study complications of diabetes, and you want to do a genome wide association study on people who've had severe diabetes from the point of view of treating it with insulin, but have had no complications, versus people who have had complications, to see if there's a genetic difference. If you can discover why some people don't have complications, can you develop a drug that mimics that in the other people?
&lt;/p&gt;
&lt;p&gt;
To do that, we want to come up with a phenotype of people who have diabetes severe enough to be treated with insulin, but who don't have complications. And we want to use the electronic health record to identify them. What are the challenges?
&lt;/p&gt;
&lt;p&gt;
Well, what if someone comes here for their diabetes care, because there's an expert in this medical center, but when they have complications, they go to the nearest hospital? My electronic health record doesn't have the data about their complications.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Of course this is the promise, and the holy grail, of federated health records.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GE&lt;/strong&gt;: Right. But this is just one example of many problems that can come up. When you're trying identify someone who hasn't had complications, you don't know if you're missing the data, or if they're truly without complications.
&lt;/p&gt;
&lt;p&gt;
How can you figure it out? Well, you can use information theoretic methods to figure out, look, I have enough information such that if this person had complications, I'd know it. If this person has a history and a physical by an internist, or three different internists over the course of 10 years, and none of them ever mentioned a complication of diabetes, then odds are this patient doesn't have a complication.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So you're interpreting the negative space?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GE&lt;/strong&gt;: Exactly. Whereas another patient, who has diabetes, and disappears for 5 years, and then comes in and has a complete blood count but not a glucose, and then has some minor dermatological procedure, and then disappears for 5 years, and is here now -- I have no reason to think that person doesn't have diabetes complications. All I know is that he or she came in to have a mole removed. I have no information about diabetic complications, for example an opthamologic complication.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Electronic health records are moving into the mainstream. You mentioned Microsoft HealthVault, Google Health was just announced. Most people have yet to encounter these things in their routine interaction with the health care system. I presume that in five years, many will have.
&lt;/p&gt;
&lt;p&gt;
I think a lot of people have the notion that the information that's being collected will be of value, not only clinically but also to research. Your point is: No, not necessarily. So my question is, if you were the czar of electronic health information, what would you like to see happen in order to merge those two goals?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GE&lt;/strong&gt;: I'd start with a caution. There's a knee-jerk reaction to say that we need to have doctors document more accurately, and more completely. But the problem is that you end up with a big structured template.
&lt;/p&gt;
&lt;p&gt;
What I envision is an intelligent record that produces a summary for clinicians that they can read, correct, and then write their note which is a succinct summary of their thinking.
&lt;/p&gt;
&lt;p&gt;
Now that doesn't answer your question, which was: How does that then get used for research? But I think that to the degree we make documentation efficient in serving health care, I think it'll also be more accurate for the sake of research.
&lt;/p&gt;
&lt;p&gt;
One thing that can go wrong, for example, is that if you're filling out a record for the sake of billing, you'll have an incentive to use diagnosis codes that optimize billing. Does that then reflect clinical accuracy? And would that then be useful for research?
&lt;/p&gt;
&lt;p&gt;
The important thing is to be grounded in the clinical truth. Put health care first, and then use new computational methods to extract accurate information.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So clinical truth is what the doctor said, in the doctor's own language. Of course there's a lot of shared convention around the terminology.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GE&lt;/strong&gt;: They learn in medical school, and throughout their professional lives, what to document. Things aren't always called the same, but the nation is working on health care standards in various ways, both for transferring information between systems and for coming up with common vocabularies.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So although many of us would assume that those vocabulary terms need to be fields in a template, you're saying that's not the first and best strategy. You'd like to see that language just used naturally, as doctors speak their narratives, and then we'll harvest what we need out of that.
&lt;/p&gt;
&lt;p&gt;
Do you think natural language processing will get us there?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GE&lt;/strong&gt;: It's not perfect. We achieved expert-level performance on a simple task. We have less than expert performance -- but not bad performance -- on the more complex task.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: How has the system improved since its introduction in the 1990s?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GE&lt;/strong&gt;: What we've done is expand our breadth. Back then we were doing mainly radiology reports, and now we cover most of medicine. I don't know that the accuracy got better, though.
&lt;/p&gt;
&lt;p&gt;
Modern natural language processing systems often depend on machine learning, and don't have deep linguistic knowledge.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Well, there are both breeds.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GE&lt;/strong&gt;: In medicine we're seeing more emphasis on statistics than on linguistics, but we believe the right answer is a combination of the two. In our case we've tried some statistical systems too, but our semantic system seems to outperform them.
&lt;/p&gt;
&lt;p&gt;
If you have a specific question, and that's the only one you need to answer, a statistical system is probably the more efficient way to go. What we do is parse the entire report, and spit out everything we can figure out from it.
&lt;/p&gt;
&lt;p&gt;
In the 1995 study our goal was to answer six questions, but the system actually parsed the whole report, said everything it found, and then in those things it said we found which were indicators of pneumonia.
&lt;/p&gt;
&lt;p&gt;
There are various techniques that you can use that do pretty well on a single question, but that don't do well if you give them an entire history and physical, and say, tell me everything there is to know about the patient. That's what MedLEE is good at.
&lt;/p&gt;
&lt;p&gt;
Systems should make it easy for people to express what they need to express -- in this case, the clinical truth. If it turns out that a super-efficient template model works best, then that's great. It's an empirical study. People will experiment over time, and see what works.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: You've also mentioned the compromise approach: summarize, then present for approval or correction.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GH&lt;/strong&gt;: Yes, but clinicians don't want to stop and correct. So we need to work on presenting the structured format that's useful enough to them to justify that effort.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: It's a perennial and vexing problem. In some ways, maybe, one of the grand computational challenges. At the interface between the data collector and the human being, the person is always going to regard the collector as an impediment.
&lt;/p&gt;
&lt;p&gt;
So, when does your project start?
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GH&lt;/strong&gt;: We've already started. For that Rich Smiley and Pamela Flood study in pre-term labor, we're already taking data out of the electronic health record for them to do their associations.
&lt;/p&gt;
&lt;p&gt;
It's nice to have a concrete problem to work on. Over the summer, what we're working on is a generic framework. So, how does the next person and the next person do this? And then we'll be working on putting together a pipeline of tools. You'll still need a person there to process the data, but it won't involve reading every chart.
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Well this sounds hopeful. Thanks!
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GH&lt;/strong&gt;: Thank you, Jon.
&lt;/p&gt;&lt;img src="http://channel9.msdn.com/489755/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Making-sense-of-electronic-health-records/</comments><link>http://channel9.msdn.com/posts/JonUdell/Making-sense-of-electronic-health-records/</link><pubDate>Thu, 12 Jun 2008 09:22:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hripcsak/hripcsak.wma</guid><evnet:views>125</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489755/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;p&gt;
My guest for this week's Perspectives show is &lt;a href="http://www.dbmi.columbia.edu/~hripcsa/"&gt;George Hripcsak&lt;/a&gt;, professor of biomedical informatics at Columbia and one of six researchers &lt;a href="http://www.microsoft.com/presspass/press/2008/apr08/04-17GWASPR.mspx"&gt;recently funded&lt;/a&gt; by Microsoft Research through its Computational Challenges of Genome Wide Association Studies (GWAS) program.
&lt;/p&gt;</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hripcsak/hripcsak.mp3" expression="full" duration="1874" fileSize="14998464" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hripcsak/hripcsak.wma" expression="full" duration="1874" fileSize="15174059" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hripcsak/hripcsak.wma" length="15174059" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Making-sense-of-electronic-health-records/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489755/Trackback.aspx</trackback:ping><category>electronic health records</category><category>genetics</category><category>MSR</category></item><item><title>How Mercy Corps syncs databases in Afghanistan</title><description>&lt;p&gt;
My guests for this week's Perspectives show are Barbara Willett and Nigel Snoad. Barbara works for &lt;a href="http://www.mercycorps.org/countries/afghanistan"&gt;Mercy Corps in Afghanistan&lt;/a&gt;, as the design, monitoring, and evaluation manager for a number of agricultural development programs. Nigel Snoad is a lead capabilities researcher for Microsoft Humanitarian Systems. Together they've pioneered the use of FeedSync as a way to synchronize data collection and reporting in an environment where Internet connectivity is spotty, and where lightweight, two-way synchronization is essential.&lt;/p&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;
            &lt;p&gt;
            &lt;img src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/mercy/mercy.jpg" /&gt;
            &lt;/p&gt;
            &lt;p&gt;
            &lt;strong&gt;Nigel Snoad and Barbara Willett&lt;/strong&gt;
            &lt;/p&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Barbara, we want to discuss the database synchronization system that you've partnered with Nigel to develop, as part of the Mercy Corps work in Afghanistan.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: I'm the design, monitoring and evaluation manager here in Afghanistan. I arrived last year in March, about the same time we had a consultant in doing a general review. He also brought another consultant who'd worked with Mercy Corp on technical issues, including the development of databases. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: And can you explain what, in this context, is being designed, monitored, and evaluated? What are the programs you're supported, and what do those programs do?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: In Afghanistan, almost all our programs revolve around agricultural development. We have a number of programs funded by USDA, the British Government, the European Commission, all with the same goal of improving the livelihoods of the Afghan people. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Is your microfinance program among those?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: Yes, we have a microfinance program, but it's one of our older ones, and it's self-sustaining now, no longer administered directly by our monitoring and evaluation system.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: What are  some examples of programs that are?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: The ABAD [Agro-Business and Agriculture Development] program is where all this started. It supports business development, and technical capacity training for farmers. &lt;/p&gt;
&lt;p&gt;There's also a lot of work in animal health, redeveloping and reestablishing veterinary field units, and training veterinaries, para-veterinaries, and female livestock workers. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So your management challenge, relative to these programs, is what?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: It's developing tools that are applicable to multiple programs doing the same kinds of things. Everybody's involved in agricultural development, and interested in measuring improvement in sales and production. It's a challenge to collect that data and share it -- both operational data and impact data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So the field workers are in various locations around the country, with intermittent Internet access?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: Right. And in these circumstances, we want to standardize how we collect, synchronize, share, and report on this information.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: You had a pre-existing system based on Microsoft Access, as I understand it, and there were problems synchronizing those databases to your central office.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: Initially we didn't have Access, actually. When I arrived there wasn't any centralized system at all. Everything was Excel-based, sharing spreadsheets month-to-month from this region to that region. So we started the Access system, then later we realized it wasn't really working out because of the Internet problems, and because the process was bulky and cumbersome.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Nigel, that's where you come in, right?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NS&lt;/strong&gt;: Yes. Our humanitarian team was in Afghanistan looking to talk with people, do a bit of show and tell, and mainly get feedback about what people would like, and what they really need. And then use that to iterate what we were doing, and look for partnerships to do pilots. &lt;/p&gt;
&lt;p&gt;With Mercy Corps we said, here's what we've got, here's what we're thinking, does that make sense to you?&lt;/p&gt;
&lt;p&gt;Mercy Corps was great for that, because they were fairly well advanced in their thinking about how they were using their Access solution, and the architecture of what they wanted to do was quite clear. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: So from your perspective, Barbara, what was the outcome? Did it look just like what you had before, except that the synchronization problems were magically solved?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: Yes. I just wanted things to share, I wanted to know that we all had the same database, and somehow, whether it's every week or every month or every minute, the information just has to connect.&lt;/p&gt;
&lt;p&gt;On paper it looked like we could do that with Access replication, but when we realized the problems that was causing, we realized that this technology Microsoft had been talking about -- which seemed maybe a little beyond our needs -- might actually solve the problems that we had. They wanted to try it, we wanted to try it, so it seemed to dovetail well.&lt;/p&gt;
&lt;p&gt;And yes, it made happen in reality what I'd wanted to happen on paper. I was willing to accept weekly or even monthly if that's what it took, but now it happens every 10 minutes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Is this a situation where the updates that flow in from various locations tend not to conflict with one another?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: Yes, conflict resolution hasn't yet been much of an issue. Our biggest issues have been in our own database development, because the database itself is still evolving. So each time that changes, it affects the job mapping and FeedSync. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NS&lt;/strong&gt;: The conflict resolution stuff is in there, and I think it'll become increasingly important as the size of the data grows, and as the activity from all the endpoints grows. &lt;/p&gt;
&lt;p&gt;But we were quite deliberate about trying this in one place, and seeing if from Mercy Corps' perspective it worked out the way the Microsoft team had envisioned. It really was a tight spiral between developing new ideas and technologies, and also proving them and using them. &lt;/p&gt;
&lt;p&gt;It was great to be able to do that in a real environment, but also a fairly controlled one, which was our first pilot in Kunduz. But then, you took it all over the country, damn you. [Laughs]&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: [Laughs]&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: There was a preexisting synchronization system that was found wanting. What was that, why didn't it work, and why is the FeedSync solution working?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NS&lt;/strong&gt;: The solution Mercy Corps was originally piloting was based on Access and Access replicas. Which is a great technology, but in Afghanistan they were struggling with an unreliable Internet connection, and those replicas weren't working well. There was a lot of data to send, and there was a peer-to-peer VPN that would work OK sometimes, but flake out sometimes, mainly due to the Internet connection.&lt;/p&gt;
&lt;p&gt;And in some cases, there was no Internet connection at all. So you have to send something by courier, be it a file on disk or on a memory stick. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: With FeedSync it's the same in the no-Internet case, you still have to sneaker-net the file. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NS&lt;/strong&gt;: Absolutely. But with Access, when you export to a file, that's a one-way transfer. And there's all kinds of data you want to get back. Corrections to data, if there's a problem. (That's where conflicts can arise.) Then there are the reference lists and the lookups -- names of provinces, names of villages, names of staff people who are attending training sessions. All these things have to flow back to the edge, and be kept consistent.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: The point being that FeedSync isn't just lightweight, and more resilient to poor connectivity, but also that it's two-way.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NS&lt;/strong&gt;: It's a two-way technology, and you've got different versions of the same thing, not one version that you're trying to somehow import and export and merge.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Barbara, is this two-way aspect evident to you as a user of the system?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: Absolutely. At first I didn't understand a lot of the terminology, and the discussions and explanations. I kept hearing the word lightweight, and I didn't really understand what that meant. &lt;/p&gt;
&lt;p&gt;But when I compared the Access replication, which basically takes the entire database and replicates it to another place -- which takes a long time, and then the Internet cuts out and you've corrupted the whole structure -- now instead of that you're sending just pieces of information. If it doesn't work right now, it'll work in a half hour, it just keeps trying, and it's completely lightweight and easy in that sense&lt;/p&gt;
&lt;p&gt;And definitely the two-way street. We were still very much developing things, and even if it were perfectly developed there are still changes that have to happen from our side. As Nigel said: staff lists. People enter training records and they have to apply them to names of staff, but we get new staff people all the time. We have to continually update the names from our side so they have an appropriate list to choose from.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NS&lt;/strong&gt;: One of the things we thought about when we were building the job manager, which is the piece that runs the FeedSync on people's desktops, is that it's an application that just sits there. You build a bunch of jobs, and a job takes a data source and syncs it with another data source. &lt;/p&gt;
&lt;p&gt;In the case of Mercy Corps, that means take a table from an Access database and sync it with a feed on a website that acts as a relay. That's a job, and you have one of those for each table in the database. Of course referential integrity is something you can try and manage, and there's some support for that.&lt;/p&gt;
&lt;p&gt;The other piece is that we can run a sync to a file. It takes the table in the database and syncs to a feed, in this case an RSS feed, on a file source. If the memory stick is plugged in, and you've got things set up right,	it just works. The user doesn't have to worry whether it's being exported to the right place, or about what the file is called. And similarly for the Internet case. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: At this point, is the master database what's up on the server at &lt;a href="http://feedsync.mslivelabs.com/"&gt;Live Labs&lt;/a&gt;?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NS&lt;/strong&gt;: Well the master source is really the database in Kabul, but yeah, the replicas are being also managed on feedsync.mslivelabs.com, where the plug is that anyone can go and set up a feed and a synchronization endpoint. All the databases sync to that, and then Kabul syncs to that and gets the data back down. And vice versa.&lt;/p&gt;
&lt;p&gt;I should point out that FeedSync used to be called SSE [Simple Sharing Extensions], and this started back then. The first users of SSE in anger, if you like, were Mercy Corps in Afghanistan, which was exciting. But we had a lot to learn. Now they're moving to FeedSync. What that means is a slightly different version of the specification, a different service on the website, and a new version of tools I just gave to Faheem a day and a half ago -- he's the technical manager for Barbara's group. This latest version of the tools is the one that I hope we're releasing publicly in a couple of weeks.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Barbara, I'm sure that going forward you'd like to see a better way to do schema evolution, so that the changes to the database structure can be part of this seamless synchronization.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: Yes, that would be ideal. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NS&lt;/strong&gt;: That's a long conversation...&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Yes. Here's something else. I know that Mercy Corp works in parts of the world where connectivity is basically SMS more than Internet, or maybe exclusively SMS. That seems like something that FeedSync could be adapted for. Have you thought about that?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: Sure, some of our other offices have been using SMS as a way to share bits of information. We haven't found the need to do that yet, because we don't have that many people far out in the field who would need to enter data. In other programs where there visiting sites and schools all over the country, then yes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NS&lt;/strong&gt;: We've already built an SMS adapter for this, it's in testing at the moment. And it does exactly what you suggest. Rather than sending the data over the Internet, it breaks it up into SMS packets. There can be a lot of packets, of course, and it can be expensive. But we were talking to a different NGO in Afghanistan, which operates in very insecure areas where they don't have Internet at all. There, they are very interested in doing something similar to what Barbara is doing, but using SMS. First, because in some of these areas the security is so bad they can't even be seen to be carrying data. Second, if it takes six hours to drive somewhere, the cost of a bunch of 1-cent SMS messages is a lot cheaper than the cost of the petrol involved. &lt;/p&gt;
&lt;p&gt;We've tried this, and it works quite nicely. I'm excited about that for the future. But to be honest, there are plenty of issues just keeping the Mercy Corps solution running. I wouldn't want to make you believe that this has been a dream installation where everything worked perfectly off the ground. Just the other day we ran into a problem where the feeds wouldn't sync.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: Sorry about that!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NS&lt;/strong&gt;: Hey, no problem. It's now documented and it's part of the FAQ that'll go out when we release the tools.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Barbara, how does Mercy Corps envision making use of the open toolkit which will be one of the outcomes of this project?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: There's been a cautious, wait-and-see approach from the beginning. Our IT has been a little like, oh, I don't know, this maybe could be interesting, let's see how it works. But now that people are understanding more, and seeing that this is not really a pilot any more, there's starting to be interest in ways we can be sharing information regionally, across offices, and how can other countries make use of the same technology. It's working its way into our lexicon.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: If there weren't the Internet problems you've had in Afghanistan, would there still be reasons to do things this way?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: Yes, I think there are other benefits. It's much closer to realtime, for one thing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: The lightweight, near-realtime aspect is appealing even when there's enough bandwidth to do more heavyweight replication?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: Yes. And also, for us, going from one Access database to another identical database is one thing. But some of our initial discussions were about sharing across platforms that may not be identical, but use common variables. Across regions or countries, we all need to report certain pieces of information, but we collect it in different ways, and store it in different ways. If there's a system where we can upload it similarly, that would be a huge benefit.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: Great point. You could define a neutral common ground for data exchange.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: Exactly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NS&lt;/strong&gt;: For Mercy Corp, there's a whole pile of options they should consider when doing sync. FeedSync isn't the be-all and end-all, it's got some particular things it seems to be good for, but that last point about interoperability is really important. &lt;/p&gt;
&lt;p&gt;From the start, we've been interested in how to use this to link up disparate systems. Sometimes that might be an Excel spreadsheet to a database. But also different organizations or, in Mercy Corps' case, different countries where they work run slightly different database designs.&lt;/p&gt;
&lt;p&gt;That's where I think the real strength of the system will lie. We've got a lot of work to do thinking about how to make that better. &lt;/p&gt;
&lt;p&gt;One of the things we're really concerned about is that, if Mercy Corps or another group wanted to roll this out, there would be the support to do that. I don't think we've got that perfect yet by any means, but we showed what we're doing to a bunch of other NGOs, and afterward a number of them were interested in taking this kind of tool -- be it FeedSync or some other -- and using it for their programs.&lt;/p&gt;
&lt;p&gt;The best example might be where you've got a whole pile of agencies implementing a program. In Afghanistan there's a thing called the National Stability Program, and it's run all over the country. All the reporting happens in a standard format, but every organization has its own way of managing the process. The challenges are to integrate the data, and pass back success and lessons learned. The big NGOs have their own systems, almost all in Access, all with some of the same schemas because the ministry says this is how you will report, but no way to aggregate all that nicely. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU&lt;/strong&gt;: This makes good sense. Thanks!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NS&lt;/strong&gt;: Yeah, thanks Barbara. And let me know how Faheem's getting along. There may be some issues, but I hope everything's OK.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BW&lt;/strong&gt;: Yeah, it's good, thanks so much.&lt;/p&gt;&lt;img src="http://channel9.msdn.com/489754/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/How-Mercy-Corps-syncs-databases-in-Afghanistan/</comments><link>http://channel9.msdn.com/posts/JonUdell/How-Mercy-Corps-syncs-databases-in-Afghanistan/</link><pubDate>Thu, 29 May 2008 13:34:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/mercy/mercy.wma</guid><evnet:views>141</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489754/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>My guests for this week's Perspectives show are Barbara Willett and Nigel Snoad. Barbara works for &lt;a href="http://www.mercycorps.org/countries/afghanistan"&gt;Mercy Corps in Afghanistan&lt;/a&gt;, as the design, monitoring, and evaluation manager for a number of agricultural development programs. Nigel Snoad is a lead capabilities researcher for Microsoft Humanitarian Systems. Together they've pioneered the use of FeedSync as a way to synchronize data collection and reporting in an environment where Internet connectivity is spotty, and where lightweight, two-way synchronization is essential.</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/mercy/mercy.mp3" expression="full" duration="35" fileSize="16631266" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/mercy/mercy.wma" expression="full" duration="35" fileSize="16823055" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/mercy/mercy.wma" length="16823055" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/How-Mercy-Corps-syncs-databases-in-Afghanistan/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489754/Trackback.aspx</trackback:ping><category>afghanistan</category><category>feedsync</category><category>mercy corps</category></item><item><title>Digital formats for long-term preservation</title><description>&lt;p&gt;
Caroline Arms is an information technologist who came to the Library of Congress to work on the &lt;a href="http://memory.loc.gov/"&gt;American Memory&lt;/a&gt; project. The challenge of preserving digital content captured her interest, and her work since has focused on understanding and promoting formats that raise the probability that content will be usefully available to future generations. She is the co-compiler, with Carl Fleischhauer, of the &lt;a href="http://www.digitalpreservation.gov/formats/"&gt;Digital Formats&lt;/a&gt; website, and a member of the committee to standardize Office Open XML.
&lt;/p&gt;

&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;
&lt;img src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/loc/caroline-arms.jpg"&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;Caroline Arms&lt;/b&gt; 
&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;hr /&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: I'm interested in your perspective on XML's role in the preservation of documents for the long term.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: I'd like to be able to go broader than XML. It's one aspect, but it's not the only answer. When we're talking about the challenge of preserving digital content we usually think more broadly.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Great point. Of course there's a whole range of issues, from how you keep the disks spinning too...well, let's step back and talk about acid-free paper, which may be a more durable format than anything we've done electronically.

&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: Absolutely. 
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So, OK, give us the broad view of how you have approached this problem at the Library of Congress.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: The Library's mission is to make its resources useful and available to Congress and to the American people, and to sustain and preserve a universal collection of knowledge and creativity for future generations. 
&lt;/p&gt;

&lt;p&gt;Congress funded the National Digital Information Infrastructure and Preservation Program (&lt;a href="http://www.digitalpreservation.gov/"&gt;NDIIPP)&lt;/a&gt;, and I've been working as part of that since the early 2000s.
&lt;/p&gt;

&lt;p&gt;The program looks for every opportunity to raise the probability that content created today will be usable by those future generations. 
&lt;/p&gt;

&lt;p&gt;I first came to the library to work on &lt;a href="http://memory.loc.gov/ammem/index.html"&gt;American Memory&lt;/a&gt;, which was digitizing out-of-copyright materials and making them available to everybody.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Of course that project isn't just a resource for future generations...
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: Right. So, there are many ways to think about raising that probability. The program is trying to build a network of organizations committed to the stewardship of digital content. Not just traditional libraries and archives, but certainly including them.
&lt;/p&gt;

&lt;p&gt;You mentioned spinning disks. We try to have conversations with storage vendors, and try to explain how we see the requirements for long-term cultural archives as being a little different from those for business continuity. 
&lt;/p&gt;

&lt;p&gt;You also mentioned acid-free paper. In the book age, we can take in a book make sure it's on acid-free paper, and it will still be there a hundred years from now. The phrase &amp;quot;benign neglect&amp;quot; gets used. Paper survives benign neglect. Digital content doesn't.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: It's a paradox. Recently I visited my parents, and we found a box of correspondence they had written from a yearlong trip to India many years ago. I realized that my own correspondence is probably a lot less likely to be available to available to my kids or grandkids.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: I have exactly the same experience. My father was away at the battlefront in World War II, and he wrote as frequently as he could. My mother still has all those letters. Today's forces are using email and cellphones and other ephemeral means of keeping in touch.
&lt;/p&gt;

&lt;p&gt;It's amazing to read the letters discussing what my name would be, because I was on the way.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Of course once that box of letters is lost, it's lost. There is no backup, there are no perfect copies. It's a paradox that we're in era when you can make perfect copies, and distribute them as widely as you want, so you'd think that superabundance would save the day, but that's not necessarily true.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: No. You have to act at the time of creation in order to up the probability. This is true for your own digital photographs, and for libraries. So we try to influence the early stages of content creation.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: How?
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: Working on standardization efforts is one way. Another is to form partnerships that try to exploit synergies with content creators. We just look for opportunities in different industries.
&lt;/p&gt;

&lt;p&gt;For example, the scholarly publications industry has an interest in preserving their own content, they also want it to be accessible through libraries, so we find synergies there. 
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Of all the businesses I know, that one is most sophisticated in its thinking, and in its efforts toward long-term preservation. Those folks really get it, and have done a lot of good work to enable a level of fidelity and persistence that is unheard of elsewhere.

&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: Another community working toward this are the professional photographers. These are mainly individuals, not corporations, but they're realizing that for their own business purposes they need to have good practices. And the practices that are good for them are pretty much aligned with the practices that we believe will be helpful.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: What are some of those practices, and how do you interact with that group to help foster them?
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: In the NDIIPP program, we've had some money we've been able to give out as awards. A couple of recent awards are to associations of photographers, and they're all to do with exploring what the good practices should be, and promoting them. So, discussion of formats, and in particular for photographs, the capturing and recording of metadata. Understanding what the tools that photographers use do, or don't do, about accumulating and retaining metadata.

&lt;/p&gt;

&lt;p&gt;Wise choice of format is important, but we don't think there's a single best format. In thinking about formats, the two key factors are disclosure -- that is, are specifications available -- and adoption. The more widely used a format is, the less likely that archival institutions will have to foot the bill for migrating it, or maintaining tools to render it.
&lt;/p&gt;

&lt;p&gt;We are interested in understanding the formats that are widely used, and promoting practices that will use those formats in good ways.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Does this boil down to recommendations that the Library of Congress has made to photographers?
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: No, it's working with photographers to find the synergies between the requirements, have the photographers promote the best practices, and perhaps to suggest what will be even better for us. But we don't have that much influence over what formats creators and publishers use. We have to learn to be able to handle the most widely used formats. 

&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Given that, what practices do you find most useful, and why? 
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: With photographs, there is value to us and to photographers in retaining as much color and spatial information as possible. The Library will be accepting a variety of formats. If your camera takes only JPEG, there's no point in going for anything else. But in general libraries and photographers have liked to keep full-resolution images without lossy compression. TIFF has been a standby, but it's not very good for embedding metadata. 
&lt;/p&gt;

&lt;p&gt;There are explorations at the moment on formats and tools for getting metadata into images. Many photographers are positive about Adobe's DNG format, with XMP metadata, which is XML-based. An advantage of XMP is that you can embed it in images, or handle it as XML outside the image. XMP as a vehicle is now being supported by more and more tools. 
&lt;/p&gt;

&lt;p&gt;But then within it, you have to have practices about what elements you record. In the photography world, the leading community is photography for journalism, so ITPC (&lt;a href="http://www.iptc.org"&gt;International Press Telecommunications Council&lt;/a&gt;) is the leading metadata standard as far as elements are concerned.

&lt;/p&gt;

&lt;p&gt;This is a case where the Library has its own metadata standards, and we don't want to lose all the experience and compatibility with our own systems and tools, but clearly the commercial market and the equipment is gathering around the IPTC metadata schema. So we need to adjust our practices so we can take advantage of that.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: You've said that you look to leading practitioners, like professional photographers and journalists, but of course anyone can produce something which -- though we won't know it at the time -- could prove to be of great cultural significance. So we have to hope the standards and practices trickle down to everybody, right?
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: Yes. The standards and practices supported in cameras and software, or in Flickr and the other management services, those are all part of the environment that we're working in, and that we have to be conscious of.
&lt;/p&gt;

&lt;p&gt;The rapidity of change is a real challenge for us. The book in its hard cover on the shelf has been there for a long time, and will continue to be. But in the digital world things change very quickly.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: It's a huge challenge, and we've yet to see the emergence of a way of dealing with this that would separate various concerns. Storage, for example, is a separable concern. It should be possible for individuals and organizations to choose from a range of storage options which would offer a range of preservation guarantees.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: Absolutely. 
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: And that wouldn't necessarily be tied to other kinds of arrangements. You mentioned Flickr. On the one hand people are using it for archival purposes. But it's also a catalog, it's also a database, it's also an environment for sharing and use. We're bundling all those concerns together right now, and that makes it difficult to get at what really matters to you in a rational way.

&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: I agree entirely. Flickr is not making any commitments to the way it's archiving the content. It is tricky. In the last few years, these big services provided by Amazon and Google are a complete change in the business model for these things. But it's interesting that the storage service from Amazon has taken off in a way that some other attempts failed. There were several others, but they couldn't build the market and the trust. I think that somehow Amazon has the trust of people because it clearly has a big problem of its own. People trust that it will take good care of its own content, and that somehow it will solve these problems. So although as you say things aren't separate, in a way the building of trust can't necessarily be separate.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Of course there are no long-term guarantees. This is where the scholarly publication folks have done the most thoughtful and intense work. They've even thought through what happens when the organization hosting the content fades away, and have seen that there needs to be a federation of cooperating businesses that transcends any individual organization. 
&lt;/p&gt;

&lt;p&gt;I should be able to swap out Flickr's storage backend for a service that offered long-term guarantees, for which I'd pay a premium. That's not an option for anyone yet, but there's a whole slew of interesting business opportunities there for lots of players in lots of niches.

&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: What's unpredictable is quite how they will develop. It's a mixture of general moves in the technology and particular organizations deciding to go in a certain direction. And then the market, whether it's consumers or industry sectors, coming together to create critical mass.
&lt;/p&gt;

&lt;p&gt;What we found in NDIIPP is that it's very hard to drive this process. You can nudge, and promote awareness of problems, but what has actually emerged in the last year or two is probably quite different from what people were talking about in 2001 when the program got started.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Where do you feel you have been successful in doing some nudging and promotion?
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: I think some of the standardization efforts, for PDF/A, the archival format for PDF, and Office Open XML, are examples of where we've been able to play a role in moving in the right direction.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: The Library of Congress has been involved in both of those standardization efforts?
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: Yes. In the PDF/A case, which happened first, this was an activity stimulated by the wishes of archival institutions and especially the legal and judicial community to have an archival document format that could substitute for paper.
&lt;/p&gt;

&lt;p&gt;The standard came out I think in 2004, and there are an increasing number of tools which can save in this format. It primarily outlaws features which are difficult for preservation.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: I was going to ask you to clarify that, because I think many people would say that PDF itself is a good archival format.

&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: The PDF/A format outlaws embedded audio and video, it requires that the text in the PDF be in reading order, it requires that the fonts used be embedded -- because in many cases PDF relies on the fonts you have on your computer -- and it requires that the fonts be legally embeddable. It also outlaws encrypting, and mandates XMP metadata.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Do you think these restrictions tend to be easy to meet, or are they onerous?
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: My guess is that in ordinary office documents, and documents that get submitted for court cases, it probably is not onerous. 
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So in terms of Office Open XML, how did you approach that?
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: We joined that effort after it was already underway. We learned that the British Library was actively involved, and we shared their interest. The general move to XML-based formats for text documents, and for the other office productivity documents, seemed to us like a very good move.
&lt;/p&gt;

&lt;p&gt;XML files, that you can look at with simple tools and hopefully understand the tag names, offer inherent advantages.
&lt;/p&gt;

&lt;p&gt;As I said, the two most important factors for preservability are disclosure and adoption. By disclosure we mean that the specification exists, in a public way, that will continue to be available. Clearly to have it exist as an international standard by a known standards organization raises the probability that it will continue to be available and used.
&lt;/p&gt;

&lt;p&gt;As to adoption, clearly the Microsoft products are widely adopted, and libraries will be collecting content produced by those applications. So this seemed like a good opportunity to influence the public availability of the specification.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: It's an interesting question as to what extent the Library will wind up interacting directly with documents produced by those applications, versus receiving content from organizations like scholarly publications, who are now for example beginning to be able to accept articles that were authored in Microsoft Word, but delivered in the NLM -- National Library of Medicine -- XML formats.

&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: Yes, you're right. Our traditional collecting has mainly been of published materials, and we expect they'll be in some form other than what your word processor creates. But we also collect the personal papers of famous individuals, so I'm sure we already have quite a lot of documents in word processing formats. 
&lt;/p&gt;

&lt;p&gt;We believe it's important to be involved early in the content creation life cycle. If the tools begin to record more information about the transformations that go on, that's of value. 
&lt;/p&gt;

&lt;p&gt;And beyond standard text documents, a phenomenal amount of valuable information is currently stored in PowerPoint files. Or, information that we might have collected on paper may be available as spreadsheets. We can't afford to assume that things will remain the way they are. 
&lt;/p&gt;

&lt;p&gt;We're harvesting lots of documents from the web that may not have been published through traditional channels, and those are likely to be word processing documents or PDFs.
&lt;/p&gt;

&lt;p&gt;I'm confident we'll have plenty of documents in word processor formats that we will have to try to preserve.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Of course the preponderance of what would be the equivalent of personal papers, at least for a certain era, will be email. And unfortunately we don't have any XML standards governing email.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: Right. So email is not something the Library of Congress spends a lot of time thinking about, but another government organization, NARA, the National Archives, for them email is very important. They capture the records of government agencies, and of each administration as it transfers power to the next. 

&lt;/p&gt;

&lt;p&gt;So, I must mention that there are other XML formats. The Open Document Format is also a very important development for us, and we hope that it will be adopted. We have to keep an open mind and see where the marketplace moves. 
&lt;/p&gt;

&lt;p&gt;We see that the general movement to XML-based formats, wherever they are appropriate, is a good thing.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Yes. Whatever the XML format, there's a huge amount of untapped potential in the interweaving of content and metadata and, actually, data -- rows and columns sorts of data which are well represented in XML formats. The numbers in spreadsheets and databases are a form of content that is merging with documents, and should.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: Absolutely. One of the projects I've been involved with goes under the name &lt;a href="http://www.icpsr.umich.edu/DATAPASS/"&gt;Data-PASS&lt;/a&gt;. It's a consortium of social science data archives. They have a descriptive standard, it's a multi-level standard with a rich XML structure that supports the online subsetting of the data.

&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So I think we're having this conversation in the nick of time, because you're retiring next month, right?
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: Yes, in late May, actually. But I expect still to be engaged in the area. It's been a very exciting time, and I hope still to be involved even if I'm trying to have more time for family and travel.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: I hope so too. So, the challenges are daunting, but I think you're mostly optimistic about the future. 

&lt;/p&gt;

&lt;p&gt;&lt;b&gt;CA&lt;/b&gt;: Yes. I've learned to take a long-term perspective. You do see that even though the steps are small, there are lots of steps being taken in hopeful directions. Eventually these problems will be worked out. And as people become aware that this is not just a problem for libraries and archives, but also, as you've pointed out, for their own correspondence, their own photographs -- and also that businesses share the same problems -- I'm confident we're moving in the right direction. And I'm glad to have helped.
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Thanks!
&lt;/p&gt;&lt;img src="http://channel9.msdn.com/489753/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Digital-formats-for-long-term-preservation/</comments><link>http://channel9.msdn.com/posts/JonUdell/Digital-formats-for-long-term-preservation/</link><pubDate>Thu, 22 May 2008 12:42:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/loc/caroline.wma</guid><evnet:views>100</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489753/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;p&gt;
Caroline Arms is an information technologist who came to the Library of Congress to work on the &lt;a href="http://memory.loc.gov/"&gt;American Memory&lt;/a&gt; project. The challenge of preserving digital content captured her interest, and her work since has focused on understanding and promoting formats that raise the probability that content will be usefully available to future generations. She is the co-compiler, with Carl Fleischhauer, of the &lt;a href="http://www.digitalpreservation.gov/formats/"&gt;Digital Formats&lt;/a&gt; website, and was a member of the technical committee that worked on the Office Open XML standard.
&lt;/p&gt;</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/loc/caroline.mp3" expression="full" duration="3000" fileSize="24141120" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/loc/caroline.wma" expression="full" duration="3000" fileSize="24421233" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/loc/caroline.wma" length="24421233" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Digital-formats-for-long-term-preservation/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489753/Trackback.aspx</trackback:ping><category>Digital preservation</category><category>OOXML</category><category>PDF/A</category></item><item><title>Where is WinFS now?</title><description>&lt;p&gt;WinFS was an ambitious effort to embed an integrated storage engine into the Windows operating system, and use it to create a shared data ecosystem. Although WinFS never shipped as a part of Windows, many of the underlying technologies have shipped, or will ship, in SQL Server and in other products. In this interview Quentin Clark traces the lineage of those technologies back to WinFS, and forward to their current incarnations. &lt;/p&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;
            &lt;p&gt;&lt;img width="300" alt="" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/winfs/quentin2.jpg" /&gt; &lt;/p&gt;
            &lt;p&gt;&lt;b&gt;Quentin Clark&lt;/b&gt; led the WinFS project from 2002 to 2006. He's now a general manager in the SQL Server organization. &lt;/p&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: You made a fascinating remark last time we spoke, which was that most of WinFS either already has shipped, or will ship. I think that would surprise a lot of people, and I'd like to hear more about what you meant by that. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: WinFS was about a lot of things. In part it was about trying to create something for the Windows platform and ecosystem around shared data between applications. Let's set that aside, because that part's not shipping. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So you mean schemas that would define contacts, and other kinds of shared entities? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Yeah. That's a mechanism, a technology required for that shared data platform. Now the notion of having that shared data platform as part of Windows isn't something we're delivering on this turn of the crank. &lt;/p&gt;
&lt;p&gt;We may choose to do that sometime in the future, based on the technology we're finishing up here, in SQL, but it's not on the immediate roadmap. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: OK. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Now let's look under the covers, and ask what was required to deliver on that goal. It's about schemas, it's about integrated storage, it's about object/relational, a bunch of things. And that's the layer you can look at and say, OK, the WinFS project, which went from ... well, it depends who you ask, but I think it went from 2002 until we shut it down in 2006 ... what was the technology that was being built for that effort, in order to meet those goals? And what happened to all that stuff? &lt;/p&gt;
&lt;p&gt;You can catalog that stuff, and look at work that we're doing now for SQL Server 2008, or ADO.NET, or VS 2008 SP1, and trace its lineage back to WinFS. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Let's do that. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: OK. I guess we can start at the top, with schemas. We're not doing anything with schemas. At the end of the WinFS project we had settled on a set of schemas. It was a very typical computer science problem, where the schemas started out as a super-small set of things, and then became the inclusion of all possible angles, properties, and interests of anybody interested in that topic whatsoever. We wound up with a contact schema with 200 or 300 properties. &lt;/p&gt;
&lt;p&gt;Then by the time we shipped the WinFS beta we were back down to that super-small subset. Here's the 10 things about people that you need to know in common across applications. &lt;/p&gt;
&lt;p&gt;But all that stuff is gone. The schemas, and a layer that we internally referred to as base, which was about the enforcement of the schemas, all that stuff we've put on the shelf. Because we didn't need it. It was for that particular application of all this other technology. &lt;/p&gt;
&lt;p&gt;So that's the one piece that didn't go anywhere. &lt;/p&gt;
&lt;p&gt;Next layer down is the APIs. The WinFS APIs were a precursor to a more generalized set of object/relational APIs, which is now shipping as what we call entity framework in ADO.NET. &lt;/p&gt;
&lt;p&gt;What's getting delivered as part of VS 2008 SP1 is an expression of that, which allows you to describe your business objects in an abstract way, using a fairly generalized entity/relationship model. In fact we got &lt;a href="http://portal.acm.org/citation.cfm?doid=1247480.1247532"&gt;best paper at SIGMOD last year&lt;/a&gt; on the model, it's a very good piece of work. &lt;/p&gt;
&lt;p&gt;So you describe your business entities in that way, with a particular formal language... &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: For people who haven't seen this, how would you characterize that language? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: It's pretty standard entity-relational. It's really a matter of describing to the system a set of properties and collections and relationships among entities. The important thing we tell people is to describe their entities as they think about them. Not as they think they should be expressed in a fully normalized database schema, and not as they need to program to them as objects, but in terms of how they think about them, and want to be able to report on them, or interact with them. &lt;/p&gt;
&lt;p&gt;From there we can derive objects you can program against, we can derive schemas to build a store of them. &lt;/p&gt;
&lt;p&gt;The traceback to WinFS is that we had a very fixed way of doing this for a particular set of entities. We built the schema around items, and items were entities that had relationships to other items. We built this whole model on a more generic substrate that we never expressed. &lt;/p&gt;
&lt;p&gt;So we said OK, we didn't ship the WinFS APIs, but we have this asset, a more generalized expression framework for entities, let's figure out how to finish that work up, and get that delivered as part of the next ADO release. &lt;/p&gt;
&lt;p&gt;This stuff is now very well integrated with LINQ. You can do LINQ to relational, where LINQ will look down into the database, look at the schemas that are there, and express that directly up into LINQ. Or you can do LINQ to entities, which allows you to have a layer of abstraction between what you're programming to and your underlying physical database schema. &lt;/p&gt;
&lt;p&gt;That work is ongoing, we're getting good feedback, we'll see how far it takes us. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: How much continuity is there in terms of the team? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: A lot. When I did the reorg, I had an Excel sheet of everyone in the organization and where we were moving them to. Last I looked at it, 80-plus percent of the team was still in SQL somewhere. &lt;/p&gt;
&lt;p&gt;One of the interesting things about WinFS was that we started hiring a different kind of person. The database team is full of traditional hardcore systems database guys. When we did WinFS we were looking for a different thing. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: In fact you don't consider yourself to be a hardcore database guy, right? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Right. I'm a good example. I started at Microsoft in the Word group, and went from there to IIS to something called Application Center, worked on the manageability technologies for a while, and then was asked to come over and do WinFS. So my background was much more about how to use databases, how do you build apps around them, and not so much what are the internal algorithms you should use for bitmap indexing. &lt;/p&gt;
&lt;p&gt;Of course we had a lot of folks from the core database team, but we hired a lot of folks that had experience with compilers, with user interfaces, with building apps on the database. A lot of those folks who were leading the API effort for WinFS are now leading the API effort for all of SQL. &lt;/p&gt;
&lt;p&gt;So that's the story for the API team. As for the rest of it, well, there's obviously a big chunk around file systems. If you want to do this shared data model, you want it to be applicable to all data, not just things you can express relationally. So we had to figure out how to merge database constructs with file systems. &lt;/p&gt;
&lt;p&gt;A lot of people thought this was impossible, and would harken back to Cairo and various other projects announced and unannounced to the public world around integrated storage, that didn't necessarily produce fruit. &lt;/p&gt;
&lt;p&gt;We had one key advantage. We found an architectural approach that allowed us to control the semantics, and provide transactional database consistency over the files that were involved, while still allowing the file system to be in control when it came to file-handle-level operations. &lt;/p&gt;
&lt;p&gt;We did it with a kernel driver that allowed us to control the namespace, and keep the database involved. The database lives up in user mode. As far as the operating system is concerned, there's no difference between SQL Server and Microsoft Word. They're high-level user-mode apps that occasionally drop down and make requests of the kernel. &lt;/p&gt;
&lt;p&gt;So there was a fundamental disconnect. How do we maintain control over this low-level system concept, the file system, by a user-mode app? We built a kernel-level driver to communicate back to the user-mode SQL process. It had a cache of what things should look like, and what things are in what state, but it was there along the API path for the file system, to allow it to control the namespace operations over files that were "in" WinFS. &lt;/p&gt;
&lt;p&gt;People would often ask me if WinFS was a file system, and I'd struggle with the answer to that, because, well, you know, from a certain standpoint the answer is yes. The stuff I saw in the shell, was it in the WinFS filesystem? Well, OK. But there are no streams inside the database. So from a user perspective, those files were "in" the filesystem. But from an API perspective it was more nuanced than that. I could still use the Win32 APIs, get some file, open it, and from that point forward the semantics were exactly like NTFS. Because it &lt;i&gt;was&lt;/i&gt; NTFS at that point. &lt;/p&gt;
&lt;p&gt;There was a certain place along the API chain where the database was completely out of the way. This allowed us to get the perfect compatibility that had tripped up other integrated storage efforts in the past. Other efforts tried to get this compatibility by emulating all the Win32 APIs, which is tough. And the performance bar is very high. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So how does this carry forward, if it does? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: It does. That approach was so good that we decided to generalize it for SQL Server 2008, as a feature called filestream. It's basically a new kind of blob support for the database. You configure a column for filestream, you can take a file and insert it as a record, you get back a file handle, you can stream things into that file handle. You can do queries and get back file handles, and get streaming API-level NTFS performance on the files you put in there. &lt;/p&gt;
&lt;p&gt;What we have not done is the namespace support. So you don't get to walk through a directory of files. You examine a row, you ask that row to give you back the right token, you start doing the Win32 operations on it. &lt;/p&gt;
&lt;p&gt;But the rest is integrated. You back up the database, you back up the filestream. From most perspectives -- except mirroring, which we didn't get to fully integrating -- it looks like any other blob. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Where do you see that being used to good effect? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Right now there's a choice people have to make. There's a size limit on blobs in the database, because we put them inside database pages, and that leads to a performance problem as well. If you want to pull a 2-gigabyte stream out of the database with traditional blobs, it's not as performant as walking up to NTFS and using a file handle. We have to recreate the file by putting together a series of database pages that are themselves a level of indirection on file system pages. &lt;/p&gt;
&lt;p&gt;So people today have to make a choice. Do I want the integration with the database, so backup works, my transactional semantics work, all this stuff works, and live with the performance and size limitations. Or do I want the best possible performance, and basically no limitations on size, by putting things in the file system, and then having my application logic figure out how to glue together the database world and these files that are now strewn about the file system. And when I do a backup, then I also have teach my operations guys that when you back up the database your not backing up all the data, you also have to worry about these files the database knows nothing about. &lt;/p&gt;
&lt;p&gt;With filestream, people don't have to make the choice. They get the performance they want, with the database integration they expect. &lt;/p&gt;
&lt;p&gt;Now the next place to take that, after 2008, is to add Win32 support. So we did this other feature as part of WinFS, which we're calling hierarchical ID. It's a column type, a new column type, which creates hierarchy support in the database. &lt;/p&gt;
&lt;p&gt;We did this for WinFS because obviously if you're storing your data in a filesystem-like hierarchy, you need to be able to do things like show me all the stuff in this folder, and answer that query lickety-split. You can't be walking through record by record looking for matches. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Or dealing with the SQL way of expressing hierarchy, which is doable but beyond my comprehension. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Yeah, it's hard. The fundamental problem is that the query processor doesn't understand the concept of path. It understands matches on columns. It can find substrings within records, but it's kind of brute force. You can use fulltext indexing, but... &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: ... but you don't get containment for free. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: That's right. So hierarchical ID is a column type that teaches the optimizer about hierarchy, about path, so you can do queries that find all the things contained within this part of the path. &lt;/p&gt;
&lt;p&gt;So we have that feature also shipping in 2008, and there are all sorts of different uses for it. For example, people use it for compliance. They'll create a hierarchy of different confidentialities and compliance levels. This thing is confidential, which is a superset of things that are executive-eyes-only. Hierarchies like that are just out there in the world. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: How do you build and visualize them? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: You tell us about them. You express the form of your hierarchy, and you populate the records accordingly. But I don't think there's a tool yet. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So there's the filestream piece, and the hierarchical ID piece, and then the Win32 namespace pieces is the shoe that hasn't yet dropped? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: That's right. In the next release we anticipate putting those two things together, the filesystem piece and the hierarchical ID piece, into a supported namespace. So you'll be able to type //machinename/sharename, up pops an Explorer window, drag and drop a file into it, go back to the database, type SELECT *, and suddenly a record appears. &lt;/p&gt;
&lt;p&gt;Potential uses for that? It's all over the place. Take our own expense reports. We used to have these Excel form templates, and you'd fill it out and submit it to some system. Then we hit a phase where it was all online, so you're on the plane home and too bad for you. But imagine they could reintroduce that template again, and you could save that Excel file directly into the database. &lt;/p&gt;
&lt;p&gt;Or more importantly, if you go to edit the thing, you don't have this process where you've taken a copy of the thing, you're editing it, you're sending it back through a mid-tier system that then has to reconcile the database records with the filesystem records. I can just say, oh, I need to add three more things. I double-click, and yes I'm still interacting with some web-based app, but the links I get are real Win32 links. I open the thing, I edit it, I stick it back, everything knows that it was changed within the right transactional semantics. &lt;/p&gt;
&lt;p&gt;People are constantly having to bridge between the file world, and the world of data around the files. Providing Win32 support gives developers the opportunity to allow the desktop clients to directly interact with a file that's part of some application, without having to go through all the semantics of the mid-tier. &lt;/p&gt;
&lt;p&gt;Are there always going to be some applications that will want to have mid-tier control over every aspect of every part of every workflow? Of course. But from a productivity standpoint, to be able to allow people to build applications more quickly, to be able to customize applications and not have to manage all those semantics themselves, that's huge. &lt;/p&gt;
&lt;p&gt;Sync is another topic, but imagine we build the right things around synchronization, so people can take the files offline. It's a major productivity gain. As a developer, you know the consistency of the world you're dealing with. You're not having to create and manage and upload and deal with copying all on your own. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: You've alluded to the downside already, which is that it now becomes a new data management discipline that is neither familiar to the people from the filesystem world nor from the database world, it's a hybrid, and that's an obstacle. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Sure, there's a learning curve, as with any other new technology. &lt;/p&gt;
&lt;p&gt;So, that's the filesystem piece, and I'm really proud of the work we've done there. We're introducing the kernel driver in 2008, we're giving people this nice marriage between the two worlds, and then we get to take that next step in the next release and give people the complete picture. &lt;/p&gt;
&lt;p&gt;I can live with the argument that we don't have integrated storage yet. Yes, we have filestream blobs in the database, which is a big step. We have the performance and the database consistency all in one package, and that's a huge step forward. But when we have Win32, at that point, unarguably, we have integrated storage. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: How do you think that plays out as the center of gravity shifts toward the cloud? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: There is no app in the world that doesn't need a database. Every cloud app has one under the covers somewhere. One thing we've learned in the last few years is that the fuzziness between structured data and unstructured data is just increasing. The major online apps that I interact with have both. You know, Hotmail has attachments. And they have limitations on attachments because they have trouble managing sizes and whatever else. &lt;/p&gt;
&lt;p&gt;We have things now where people can create some space, put some files up there, but man, if you want any metadata around those files, too bad, it's just a dumb blob store. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: What I'm getting to here is that, well, part of the challenge for WinFS as originally conceived, with a heavy client component, was: How do you get the network effects? Five years later the center of gravity has shifted, there are shared spaces in the cloud where those effects can happen. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Yes. And I think the technology we're building is underlying technology for the cloud apps. All of our major properties are built on SQL, and they want to use this stuff, we have work going on there, pre-release work to take advantage of these features, because they want them. &lt;/p&gt;
&lt;p&gt;From a business standpoint, my first concern is how to provide value to our customers. And those are our customers. The people building the cloud apps are our customers. &lt;/p&gt;
&lt;p&gt;Now, beyond that, one of the things we used to say about WinFS was that it was the world's best mashup playground, because you had all the data in one place. In the mashup world you're talking to one service at a time. &lt;/p&gt;
&lt;p&gt;Do I think that the opportunity to build applications that solve real end-user problems building on technology like this continues to thrive? Sure. &lt;/p&gt;
&lt;p&gt;When I think about the enterprise space, which is primarly where we sell SQL, they want this. They want a repository, and they want it not to be restricted on the types of data it has. &lt;/p&gt;
&lt;p&gt;You'd be surprised, SQL's behind some of the biggest cloud services on the planet. And our customers who are building them have been struggling with this structured-versus-unstructured data problem. &lt;/p&gt;
&lt;p&gt;Filestream alone gives them the answer. They don't so much need the Win32 aspect, because they have enough app development expertise in the mid-tier to bridge this stuff reasonably well. But they do want the transactional and backup consistencies that filestream gives them. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Is that ultimate mashup playground also a good environment in which to iteratively work out what some key schemas need to be? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Yeah, that leads to another interesting point. Going through the litany of technologies that have come from WinFS, one of them is the notion of what I refer to as semi-structured records. The schema is not necessarily all that well defined at the outset of the application. How does the database handle that? We had built WinFS around a feature called UDTs, which is a column type -- a CLR type system type. &lt;/p&gt;
&lt;p&gt;We finished that up, and we built a whole spatial datatype on it in SQL Server 2008, it's all good stuff. &lt;/p&gt;
&lt;p&gt;But when we stepped back and looked at the semi-structured data problem in a larger context, beyond the WinFS requirements, we saw the need to extend the top-level SQL type system in that way. Not just UDTs, but to have arbitrary extensibility. &lt;/p&gt;
&lt;p&gt;So we did this feature in SQL Server 2008 that we internally refer to as sparse columns. It's a combination of various things. First, a large number of columns. Right now there's a 1024 limit on the number of columns in a single SQL table. We're way widening that out. &lt;/p&gt;
&lt;p&gt;That comes of course with the ability to store data that's very sparsely populated across a large number of columns. In SQL Server 2005 we actually allocate space for every column in every row, whether it's filled or not. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: This is what the semantic web folks are interested in, right? Having attributes scattered through a sparse matrix? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: That's right. And that leads to another thing which we call column groups, which allow you to clump a few of them together and say, that's a thing, I'm going to put a moniker on that and treat it as an equivalence class in some dimension. &lt;/p&gt;
&lt;p&gt;Then we have something called filter indices, where instead of creating an index that spans all the records in a table, you can specify what records it applies to. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: When it's really cheap to make lots of those equivalences, you get the ability to let people call things however they want to call them. There can be lots of aliases and labels floating around, and people can have their own vocabularies. You don't have to be so rigid about names. As you discover equivalences, you map them, and that's very efficient. Versus trying to get people in committees to agree how to call things, that's the hardest problem in the world. But if you can let people operate in their own semantic namespaces, and then bridge things together... &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: And that gets back to why the entity data model is so important. It lets people have their own way of describing, programming to, and interacting with the data they want to deal with. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: Now what about relationships? In WinFS, a relationship among entities was a first-class object. How does that carry forward? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: The notion of a relationship is a first-class object in the entity data model. Now what we haven't done there is bridged an understanding of that into the database itself. Can the query processor understand a relationship, and be optimal for navigating through those semantics? We haven't bridged that part of the world yet. It's certainly possible to create database schemas that allow you to have good query efficiency through your entity model, but it's still intellectual work. We'd like it to be so that the database can look at an EDM schema and create at least the approriate indices so when you are examining things through that lens, we can make sure your experience is optimal. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Finally there's synchronization. It went through a classic computer-science learning curve as well. At first we said, we need to synch with the cloud, with other WinFS instances, with server systems, how hard can this be? &lt;/p&gt;
&lt;p&gt;Then we quickly realized how hard this was. What should be more infamous than people breaking their pick on integrated storage is people breaking their pick on multimaster replication. It's an incredibly difficult problem to get right. &lt;/p&gt;
&lt;p&gt;Apps that have gotten this right for a particular domain have become wildly popular. Lotus Notes got it right for a particular domain, so did Exchange and Outlook, but a generalized solution has been very elusive. &lt;/p&gt;
&lt;p&gt;Anyway, we did a partnership with Microsoft research, and at some point along the arc we solved it fairly well. It's not trivial. This is not something that ends up being a simple solution to this very complex problem. It's actually reasonably sophisticated, but it works, and we built it in as part of the last WinFS beta. &lt;/p&gt;
&lt;p&gt;As they realized they were onto something, they started to fork out a componentized version of it that's now finding its way into a bunch of Microsoft products. The official branding is Microsoft Sync Framework. I think they're on target for shipping it in six different products, and for embedding it all over the place. &lt;/p&gt;
&lt;p&gt;Building an app like Outlook, from scratch, is hard. You can always interact with your data, when you're connected the thing will always synchronize and reconcile, when it's offline it still provides a consistent experience. To build that from scratch, it's really hard. Taking the sync framework allows people to go and build that experience without having to solve the hard multimaster synchronization problems. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: Finally, we'd done a bunch of work to keep the SQL engine tamed and behaving properly on the desktop. Some of that has found its way into SQL Server 2008 and some has not, because there's a less pressing need for it. But for departments, and for SQL Server Express on the desktop, we still want to finish that. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU&lt;/b&gt;: So to wrap up, I'd like you to reflect on how the original environment for WinFS was the end-user desktop, but now the environment in which many of these technologies have come to fruition is the enterprise datacenter and backoffice. How do these worlds yet come together? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;QC&lt;/b&gt;: I was very happy to be able to take the technology forward, because I saw the broad applicability, not just in the problem space we were working on, but in terms of the general usefulness of the database. &lt;/p&gt;
&lt;p&gt;My job is to grow the usefulness of the database. The work we did with WinFS was in line with that, and I'm happy with that, but there's a part of me which is still unfulfilled. Boy, what would it mean if every application could have some shared notions about, for example, the people in my life, that other applications could plug into and use. &lt;/p&gt;
&lt;p&gt;Can we express that fully in a cloud way? Maybe. It harkens back to the old Hailstorm ideas. And we have things like Astoria [SQL Server Data Services] that is a projection of entities over the web. That's awfully familiar, both in terms of WinFS and in terms of Hailstorm. &lt;/p&gt;
&lt;p&gt;Where it goes, I don't know. We've made a choice right now to incubate some underlying platform technologies for the web, and allow the operating system team to cycle on the stuff that's on their plates right now. &lt;/p&gt;
&lt;p&gt;But I think not too long from now we'll come out of those cycles and say, OK, we have all this fundamental technology, what's the next big innovation we can do? &lt;/p&gt;
&lt;p&gt;That's kind of where we got tripped up in the Longhorn cycle. We were building too much of the house at once. We had guys working on the roof while we were still pouring concrete for the foundation. &lt;/p&gt;
&lt;p&gt;At one point we realized we needed to decouple things. And that really did give this team the freedom to go off and take these underlying technologies, which we believe were fundamental to the database, and get them done correctly. &lt;/p&gt;
&lt;p&gt;But I do at some point want to see that place in my heart fulfilled around the shared data ecosystem for users, because I believe the power of that is enormous. &lt;/p&gt;
&lt;p&gt;I think we'll get there. But for now we'll let the concrete dry, and get the framing in place, and then we'll see how the rest of the house shapes up. &lt;/p&gt;&lt;img src="http://channel9.msdn.com/489752/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Where-is-WinFS-now/</comments><link>http://channel9.msdn.com/posts/JonUdell/Where-is-WinFS-now/</link><pubDate>Thu, 15 May 2008 13:20:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/winfs/winfs.wma</guid><evnet:views>747</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489752/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>WinFS was an ambitious effort to embed an integrated storage engine into the Windows operating system, and use it to create a shared data ecosystem. Although WinFS never shipped as a part of Windows, many of the underlying technologies have shipped, or will ship, in SQL Server and in other products. In this interview Quentin Clark traces the lineage of those technologies back to WinFS, and forward to their current incarnations...</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/winfs/winfs.mp3" expression="full" duration="3240" fileSize="25914048" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/winfs/winfs.wma" expression="full" duration="3240" fileSize="26213725" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/winfs/winfs.wma" length="26213725" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Where-is-WinFS-now/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489752/Trackback.aspx</trackback:ping><category>WinFS</category></item><item><title>OpenSearch federation with Search Server 2008</title><description>&lt;p&gt;With the new OpenSearch-based federation capability in Search Server 2008, you can integrate any external search service that can expose results as an RSS feed. In this podcast Jon Udell discusses search federation with Richard Riley and Keller Smith. &lt;/p&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;&lt;img alt="" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/searchserver/rriley.jpg" /&gt;
            &lt;p&gt;&lt;b&gt;Richard Riley&lt;/b&gt; is a Senior Technical Product Manager for Microsoft Office SharePoint Server 2007. He is responsible for driving Technical Readiness both within and outside of Microsoft and specializes in the Enterprise Content Management and Search features of the product. &lt;/p&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt; &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;img alt="" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/searchserver/kells.jpg" /&gt;
            &lt;p&gt;&lt;b&gt;Keller Smith&lt;/b&gt; is a Program Manager in the Business Search Group at Microsoft. He designs and manages new enterprise search features in the areas of Federation and End-User UI. His passion has always been to improve the lives of users through exciting new ideas in software. &lt;/p&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;hr /&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;Links&lt;/b&gt;&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;
            &lt;p&gt;&lt;a href="http://blogs.msdn.com/enterprisesearch/default.aspx"&gt;Enterprise Search Blog&lt;/a&gt;&lt;/p&gt;
            &lt;p&gt;&lt;a href="http://www.microsoft.com/enterprisesearch/connectors/federated.aspx"&gt;Search Gallery&lt;/a&gt; &lt;/p&gt;
            &lt;p&gt;&lt;a href="http://www.microsoft.com/enterprisesearch/connectors/federated.aspx"&gt;Location Definition File Schema&lt;/a&gt; &lt;/p&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; What's the lineage of this search server? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; The technology that was built into Index Server, way back in the NT4 option pack, has grown and diversified into various products, including desktop search and SharePoint. They've split apart now, but the common DNA is there. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; What differentiates this search server from its predecessor? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; We found that customers wanted to use the search capability without buying the whole SharePoint product. So we split the search features into Microsoft Office SharePoint Server for Search. People could buy that and use the search features without the full MOSS functionality. Search Server is the next version of that. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; What were the domains over which MOSS 2007 could search? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; Anything you could crawl. Out of the box, SharePoint plus other content sources we had handlers for, including Notes. Or you could go to the effort of writing your own protocol handler, or business data connection. But if you couldn't find a way to index it yourself, there was no way to connect to the data. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; So how does federation change the game? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; Instead of indexing the content, you're leveraging an external search engine that already exists. That engine returns results back in an XML format we can render. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; I was fascinated to learn you're using the OpenSearch mechanisms and formats to accomplish this. I did an early implemention for Amazon A9, and it was trivial since I already had an RSS feed coming out of the search engine I wanted to integrate. Is that still how it works? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; Yes. Any search engine that emits an RSS feed, you can connect to. It takes about 5 minutes to set it up. You take the query URL, put in into a federated location definition (FLD) file), and away you go. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; I guess the part of OpenSearch people will be most familiar with is the description that drives the search drop-downs in browsers. It's a little package of XML that defines the template for the query. You must be using that in Search Server as well, when it acts as a client to federated sources. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; Yes, exactly. SharePoint is behaving as a client, just as IE is. When you create a federated location definition, you're creating one of these OpenSearch description files. But, we add some schema changes for the triggers that SharePoint uses to know when to send queries to that location. And we add the XSL used to render the results. So we extend the OpenSearch schema to make it more useful to SharePoint. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; When you start shipping queries over the net to multiple federated sources, you start running into issues of sequencing and latency. How do you deal with that? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; You add federated locations as web parts. And you can choose whether to load them synchronously or asynchrously. Everything synchronous will be loaded first, and then the queries are sent off to each asynchronous web part. &lt;/p&gt;
&lt;table&gt;
    
    
&lt;/table&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; And you'll use AJAX to weave in results in as they arrive? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; Right. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; One of the sources can be SQL Server. How does that work? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; You need a simple connector that exposes an RSS feed. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; In the case of SQL Server, there's the option to do structured search. Can I pass through an XPath query? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; Well, it's up to you to write the connector. If you want to accept XPath in the query, and return results on that basis, it's your code. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; What I like about this is that the act of creating an OpenSearch RSS feed on top of a source is just plain useful, independently of Search Server. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; Absolutely. We use that in SharePoint Search, and also in Search Server, you can get an RSS feed of any result set. It's great for alerting. Set up a fairly restricted search, and your RSS reader will get new items when they appear. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; It's great that you're using OpenSearch this way. Was there any debate about it? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; There are many ways to connect to other sources, but we felt there was a need to federate out in a very lightweight way. OpenSearch already had a scheme that was relatively well adopted, and served our needs as a base, though we did extend it as I've mentioned. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; How do I control the results display? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; You can customize the XSL, so anything you can retrieve from the source you can format in any way you want. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; Can I extend the results metadata? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; Yes, you can override the OpenSearch defaults, specify which fields you care about, and use those in your XSL. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; And, Search Server is free? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; Yes, just go download it from microsoft.com/enterprisesearch. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; How far can you go with the free version? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; You can install the express version with either SQL Express or SQL Server. With SQL Express you can run up to 400 to 500 thousand documents. With SQL Server, you can run to millions. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;Q:&lt;/b&gt; What about federation? Will there be a cap on the number of sources? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;A:&lt;/b&gt; No limit on sources. The only difference is that the express version requires you to install all the search services onto a single server. With the licensed version you can spread those across machines. &lt;/p&gt;&lt;img src="http://channel9.msdn.com/489751/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/OpenSearch-federation-with-Search-Server-2008/</comments><link>http://channel9.msdn.com/posts/JonUdell/OpenSearch-federation-with-Search-Server-2008/</link><pubDate>Thu, 01 May 2008 14:59:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/searchserver/searchserver.wma</guid><evnet:views>212</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489751/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>With the new OpenSearch-based federation capability in Search Server 2008, you can integrate any external search service that can expose results as an RSS feed. In this podcast Jon Udell discusses search federation with Richard Riley and Keller Smith.</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/searchserver/searchserver.mp3" expression="full" duration="1448" fileSize="11913408" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/searchserver/searchserver.wma" expression="full" duration="1448" fileSize="12053033" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/searchserver/searchserver.wma" length="12053033" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/OpenSearch-federation-with-Search-Server-2008/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489751/Trackback.aspx</trackback:ping><category>federation</category><category>OpenSearch</category><category>Search Server 2008</category></item><item><title>Ray Ozzie introduces Live Mesh</title><description>&lt;img src="http://channel9.msdn.com/Link/70e4b624-1012-42d1-949f-0766affe328c/" border="0" /&gt;&lt;h1&gt;Introducing Live Mesh&lt;/h1&gt;
&lt;p&gt;
&lt;i&gt;
In this audio version of a &lt;a href="http://channel9.msdn.com/showpost.aspx?postid=399578"&gt;Channel 9 video&lt;/a&gt;, Ray Ozzie discusses his role as Microsoft's chief software architect, and the role of Live Mesh as one aspect of an emerging Internet-oriented platform. 
&lt;/i&gt;
&lt;/p&gt;

&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;img width="250" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/ozzie-livemesh/ozzie01.jpg"&gt;
&lt;div&gt;
&lt;b&gt;Ray Ozzie&lt;/b&gt; is Microsoft's chief software architect.
&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;hr&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Links&lt;/b&gt;&lt;/p&gt;
&lt;div&gt;&lt;a href="http://www.mesh.com"&gt;mesh.com&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;&lt;a href="http://channel9.msdn.com/ShowPost.aspx?PostID=399578"&gt;Video&lt;/a&gt; of this interview on Channel 9&lt;/div&gt;
&lt;div&gt;&lt;a href="http://channel9.msdn.com/ShowPost.aspx?PostID=399577"&gt;Abolade Gbadegesin&lt;/a&gt; on the architecture of Live Mesh&lt;/div&gt;
&lt;div&gt;&lt;a href="http://www.on10.net/blogs/nic/Hands-on-with-Live-Mesh/"&gt;Demo&lt;/a&gt; of Live Mesh on Channel 10&lt;/div&gt;
&lt;div&gt;Mike Zintel on &lt;a href="http://blogs.msdn.com/livemesh/archive/2008/04/21/live-mesh-as-a-platform.aspx"&gt;Live Mesh as a platform&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;Background on &lt;a href="http://blog.jonudell.net/2007/12/07/from-simple-sharing-extensions-to-feedsync/"&gt;FeedSync&lt;/a&gt;&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;
&lt;hr /&gt;
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;:  Hello Ray! Thanks for joining us.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  It is great to be here Jon.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: So, it's been about 3 years since you joined Microsoft, initially as CTO. People tend to wonder what it's like coming from a company of 300 to a company on the scale of Microsoft. 
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  I've had the luxury of career working for small companies: Software Arts in the early days, and a couple of startups in Iris and Groove. But Lotus ended up being acquired by IBM, so I was at one big company before coming to Microsoft. It's tremendous in terms of the potential impact that someone can have.  I think everyone at Microsoft tends to be here because you want to have a tremendous impact, and certainly that was a tremendous draw.
&lt;/p&gt;
&lt;p&gt;
What I really do enjoy about the role as CSA, is being at the juncture of business strategy, product and market strategy, and technical strategy. I have the opportunity to work with not only the executive team on larger strategic issues, but also with the product teams at fairly detailed technical architectural levels. As an engineer, it is really fascinating, and I've met a lot of great people.  
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: People also wonder what it's like to step into a role formerly occupied by Bill Gates. What kind of continuity will there be, and how might you want to reshape the role?
&lt;/p&gt;

&lt;p&gt;&lt;b&gt;RO&lt;/b&gt;: Bill is a very unique individual. There will never be another Bill.  He has got a tremendous palette of talents, both technical as he applied at Microsoft, and non-technical in the role he's moving into. In shaping the role after July, when he won't be here full time anymore, he really split the role into two pieces. Craig Mundie takes over long-term issues, research and things like that. And I have taken over most of the technical and product strategy related to products that'll ship within a couple years.  
&lt;/p&gt;

&lt;p&gt;
My background is different than Bill's was. I've been a lot more hands-on in the product design for a number of years. I'm dealing with broader issues than I've dealt with in the past, but my background in product development gives me a lot of grounding in terms of working with a development team. And I think Craig and I make a good pairing in terms of filling his shoes.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;:  How do you balance the need to span a vast spectrum of activities and the need to go deep on things?
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  Time management -- attention management -- is really the biggest challenge. The pace is fairly brutal.  At the beginning of the year, I'll kind of plan out how much of my time in hours I want to spend in different categories of things.  There's some allocation for the rhythm of the business and high level strategic things.  There are allocations in terms of time I want to spend with product groups. 
&lt;/p&gt;

&lt;p&gt;
And then there's a fraction I didn't initially realize I had to be as intentional about, but sometimes you have to create white space because, like a task scheduler that has too many ready tasks, you can thrash if you spend all day dealing in a reactive mode to the incoming issues, the incoming communications.  Sometimes you have to create some white space in order to think and understand what is going on in the environment. I can do that by going away, by traveling to our international offices.  Bill had something called Think Week that we are continuing in a slightly modified form going forward. And there are other ways.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;:  Is one of those ways maybe to sometimes focus deeply on particular interests of yours?  If so, what would some of those be?
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  Well, the problem is most of my interests are technically related and so in theory I would just go write some code. I don't do that anymore, though, and honestly the best way that I've found to clear my mind really is either to go to a conference that's a little off the beaten path, or just travel somewhere, maybe with my wife, that is not technology related, just to clear it out and re-prioritize.  It is probably something that everyone has to do. In the old days when I did code, I used to have a 4-hour rule that said: "Do not write code unless you can at least have 4 hours of contiguous time where you will not be interrupted." Otherwise you end up introducing more bugs than the code you are writing. In a way, this is kind of the life management equivalent.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;:  So, in the talk that you gave at MIX, you introduced the interesting phrase "utility computing". I got to thinking that although "web 2.0" is the meme of the current era, people may have forgotten that for quite a while, Tim O'Reilly was actually trying to establish "Internet operating system" as a meme. That didn't really stick, and now it's come around again as "web 2.0", but "Internet operating system" is a pretty evocative phrase, as is "utility computing." We have talked about some things that are coming.  We're going to talk more about a part of that initiative here, the Live Mesh announcement, but I wonder if you could reflect a little bit on what an Internet operating system could be.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  Maybe I should just step back and talk about the environment a little.  I mean, when I got into this business back in the 70s, utility computing was really hot, it was called the mainframe.  We had these raised floors, freon cooling. It really was a utility. We used virtualization. There was "time sharing".  They were boring terms, but we were used to treating computing as a utility, and the PC revolution was all about empowerment and kind of getting back some of that personal feel, getting control of building solutions for things that might be really meaningful for you.  
&lt;/p&gt;

&lt;p&gt;
So the pendulum swung to the personal, and then with the web, when the web first emerged, it's odd, the nature of how technology is shaped is based on the constraints of the environment, whether it is computing constraints, communication constraints, and so on. The early web grew up in an era of dial-up, of 56K dial-up, so a lot of the way the protocols are structured, where computing was located, were based on that balance of computation and storage on the back-end, a really thin straw, a smart terminal that we call a browser on the front end, and that's how it was born.
&lt;/p&gt;

&lt;p&gt;
Nowadays, we've got increasing ubiquity of broadband.  We've got this big fat pipe so we can send more data.  You still can't be chatty, but we can send more data back and forth, and it gives us, as architects, the ability to revisit what should be the right balance between, for any given solution, of what's on the back end and what's on the front end.  We have amazing computation abilities on both sides.  We have amazing storage abilities on both sides.  So now, in this unconstrained environment, really the question is: what is the right way to build a solution?  Application models have had begun to evolve that start to take advantage of some of these things on both sides, and I think really when we talk about utility computing now, what we are saying is, if you are building a solution now, what is the right way that back-end utility should expose its resources? What business models? What application design patterns are appropriate for the cloud? Map-reduce-like patterns, pure horizontally scalable patterns, are much better for that back-end. 
&lt;/p&gt;

&lt;p&gt;
What should the front-end programming model be like? We started with the PC in a model of one computer for some subset of users.  Bill and Steves dream was to have a computer on every desk and in every home, and we have gotten into that point, but now we have gone beyond that point. Every individual has a phone and a PC.  Many people have multiple PCs at home.  They might have a PC at home, at work.  We have got computer-like devices in our cars, sitting underneath our TVs with the set-top box. People have home security systems.  There are lots of devices around, and I think now is the time to reflect, what is the right programming model for the client environment that we have got?  What is the right programming model for the cloud? And at least from Microsoft, how can we built tools and services to help developers build great businesses, to build great solutions, using both those back-end and front-end resources? 
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;:  From that perspective, we see some interesting offerings emerging in what we can broadly call the Internet operating system space.  Amazon surprised me quite a lot in the last couple of years in the things they have done. I don't think people were too surprised by the Google announcement more recently.  I would invite you to reflect on the kind of company that Microsoft historically has been, and therefore, the kind of approach that Microsoft can take to this problem as it might compare to the kind of approach that these other companies can take.  
&lt;/p&gt;


&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  I have no products announcements to make.  [laughs]
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: I understand that. [laughs]
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  I'll just reflect in our approach, and compare and contrast if you like.  Microsoft's approach...I can tell you this because I was a developer for most of my career on the outside of Microsoft...I've been here for 3 years, but I had a relationship with Microsoft as an ISV since roughly the beginning the Microsoft.  I met Bill and Steve in 1981 when I was first coming out to talk to them about some DOS issues. Microsoft, I believe, has always taken a perspective of ... its DNA is as a platform company. And in order to have a successful platform, you've got to have successful ISVs, people who are being selfish about their solution, what they are trying to deliver, but we have to -- semi-selflessly and semi-selfishly -- serve those people.  We've got to build a good business, but we've got to do so by serving those people and letting developers build great businesses. So in any platform, any utility computing environment that we would consider, we would be taking a broader perspective.
&lt;/p&gt;
&lt;p&gt;
We would look at a 20- or 30- year horizon and say: How is this all panning out? What is the broad range of developers out there? What does the new-age ISV look like?  It's a web ISV. There are also client ISVs, but client code is changing, it has cloud interconnections now.  What does a VAR look like these days, a solution VAR?  What does an enterprise developer look like?  What is the enterprise environment going to look like when it's transitioned from an on-premises data center to one that factors in both an on-premises data center and the cloud.  Perhaps there would be some businesses, small to medium size businesses, that might shift completely to the cloud for their back end.  But most major enterprises would have some kind of hybrid. So when we step back and look at tools, languages, application design patterns, operating systems, and runtimes, we kind of look at it and say: How will we design this for the way that the environment will shape over the next 5, 10, 20 years? As opposed to what does the web look like today, what are the capabilities today.  I think Amazon has done a great thing in terms of opening people's eyes to the power of, coming from the ground up, what does it look like to make raw resources, raw VMs, or blobs, available to a developer.  I think they've done all of us a great service, and themselves.  Google's recent announcement, I think, is actually the inverse.  It's done a good service in terms of looking at an individual developer and saying: "Hey, for a specific problem, what is a very simple way of getting into this cloud game with a relatively constrained pattern and model, but doing it in a fairly slick, seamless way."   I think those are both interesting viewpoints and ultimately the answer that the broad developer audience wants will be a combination of those and many other things.  
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: Good.  So in that context, we have just announced Live Mesh, and when I first saw it, I worried a little bit that people would see it in comparison to a lot of things which on the surface, it compares to. It can look like a FolderShare kind of thing, it can look like a screen-sharing kind of thing, it has those aspects. But in fact, this is one example of some platform-like capability for which those things are really trivial applications that have been layered on top. We &lt;i&gt;can&lt;/i&gt; talk about Live Mesh, so let's talk about it.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  Live Mesh began with the perspective of saying, what is the environment that we are in today, and that we'll be in for the next who knows how many years. As users we're in a multi-device environment, and we need to cope with these many different devices.  Each one of us at home ends up being a kind of system integrator, if you want to get simple media sharing scenarios done between devices at home. You might have different contact-sharing things between your phone device and other things that you are dealing with on your web or on your PC.  If you're in a productivity context, you have document-sharing scenarios both among people and among devices.  
&lt;/p&gt;
&lt;p&gt;
Each one of use has had challenges, if we have multiple PCs and multiple devices, figuring out how to get the most recent version of that application installed on all the right devices that we're the system managers for. On the enterprise side, we've solved this quite well with things like SMS that lets an enterprise push things out to many desktops and manage desktops, but we haven't actually solved that problem for individuals.  It hasn't been a huge pain point for individuals, but now it's becoming more of a pain point.  
&lt;/p&gt;
&lt;p&gt;
That's one aspect that started us down the path of Live Mesh.  We basically said, the OS as it is right now, the OS for the phone, the OS for the desktop, the Xbox, the OS for a Zune, the OS for the PC, are all designed more or less to expose the resources of that device to developers and to users, but they are really not designed in concert with other devices.  What is going on in the web is mostly done serving the web, and the browser is largely disconnected from those devices.  If we were designing an operating environment for users or developers today, looking forward, it would probably look a little bit different. It would look like something that would bring those devices together for the end user. And so that is one thing that Live Mesh does.  It brings together your devices. You use the web as a hub to claim your device.  You securely identify yourself as an authorized user of this device.  Multiple people can own a device as authorized users and each person can have many devices.  
&lt;/p&gt;
&lt;p&gt;
Once you've said that's your device, it enables many things.  It enables centralized health monitoring and status reporting.  It enables settings replication across your devices in computers where you think appropriate.  It lets data flow among those devices, whether files and folders, or other things that I will talk about in a minute, like feeds. And it lets applications be configured and potentially licensed across your device mesh.
&lt;/p&gt;
&lt;p&gt;
And in solving the problem of getting things to work across your devices, the same kind of technologies can be used for multiple people.  So if you share a folder of documents, if you are working on a set of documents on your desktop with someone else, those same technologies that are used to synchronize that folder across devices can be used for me to share with you or other people.  So from the user's perspective, we think that Live Mesh can really transform your experience with multiple PCs and things like your phone to make the experience very seamless in that way. 
&lt;/p&gt;
&lt;p&gt;
Now let me just come from the developer's perspective, Live Mesh is actually a platform. What you see with Live Mesh when you download it is a very small piece, from the user's perspective, of what it actually is, because it was built to enable innovation in a variety of ways.  You can kind of think of what you see as the shell. If Windows or an OS has a broad sort of capabilities that is exposed by its APIs to developers, the shell, the command line of an OS or the Finder or the desktop within Windows is a thin exposure of that to users.  For Live Mesh, file and folder synchronization is that small amount that gives the user a taste for the capabilities of this platform.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;:  You've talked with me before about a couple of distinct application patterns. I think one of them is going to be intuitively obvious to people because the folder and file synchronization thing is something that people already do. So people are going to kind of get that if you drop a thing here, it shows up there, and hopefully they'll be delighted to find out that subtler kinds of things than whole files and folders can also participate in that synchronization. And they'll be interested to see how they can then bring people into the equation, with sharing. All that I think is what people will get at first glance. I think what will be less obvious is the way in which websites can use Live Mesh to optimize the communication of stuff down to individuals and groups of individuals. Since that's less obvious maybe we should take a moment to spell that out.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  Sure. You can look at it from two perspectives.  Live Mesh is a way of enabling rich applications on the PC to get their settings and their data across devices, as you just said. But it's also a way for websites to be able to efficiently extend their function down to a world of devices. The PC for sure, but also phones and other devices. One of the things that we inadvertently stumbled upon in Groove was that enterprises wanted to use this technology to help them extend the functions of their websites out to a world of devices. That isn't what Groove was designed to do.  It was more designed as a peer sharing mechanism.  So one of the things that Live Mesh is all about is essentially, from day one, providing a centralized infrastructure such that this platform that's on all of the clients goes to this one service in the cloud to manage, all under the covers, all the synchronization. Now the actual data may flow peer-to-peer, it might flow relayed through the cloud encrypted, but one thing that is for certain is that an arbitrary web site won't have to deal with the complexities of synchronization. They can develop an application, using technologies that they are familiar with -- web development technologies -- and develop a piece of that application that gets downloaded to the client, that has local storage synchronized with the web site, they can update the application and the updates get  distributed transparently...
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: Or maybe it's just data. 
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;: Yep, could be data.

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;
Let's talk about my bank's web site and my travel web site, two websites that I frequently do business with. In both cases there's data exchanged, and I would love for that data to be exchanged in a fully synchronized and reliable and transparent way.  What you're saying is that both of those web sites, and any other of web sites that I transact with, can pretty straightforwardly get into the game of plugging their pipes into my mesh.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  That's right.  They can plug their pipe into your mesh, and it's through mechanisms that most web developers these days are becoming increasingly familiar with, such as feeds. Some people might remember a few years ago there was a technology that we introduced, an extension to RSS called SSE (simple sharing extensions), that eventually matured -- with the help of the partner community -- into something we now refer to as FeedSync. It's essentially RSS and Atom extensions to a technology that was initially developed for publishing, where you have a list of items that get updated, and through a publish/subscribe mechanism, the updates get sent out. FeedSync extends that to make it bidirectional. You can essentially do crosswise subscriptions.  I subscribe to you and you subscribe to me on the same feed. We can both modify it, and make sense of the results, and understand how conflicts are dealt with.
&lt;/p&gt;
&lt;p&gt;
By using this very simple technology, we connect the web site to our cloud, our cloud to the clients, the clients to each other.  It is just a very simple thing. The base model of what is an application within the Live Mesh environment begins with essentially a feed of feeds.  One feed represents a logical thing that a site might be exchanging with the client. That's called a mesh object. It's a feed of feeds. A developer can new up one of these things, and its elements are other feeds. An application can develop as many feeds as it likes. Some of those feeds are hard-wired to be things like the list of members, or the list of devices in the feed, but then the application can develop many more.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: The way I'm thinking about it is that the sub-feeds are basically custom objects. If it's a calendaring application, those might be calendar events. In banking applications, they might be transactions.  But the notion is that the same infrastructure that's synchronizing files and folders can also synchronize these custom objects in the same way.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  That's exactly right. In what they see today, if you open up a folder, what we would refer to as a mesh-enabled folder -- it's one of these mesh objects.  And in essence every file within that folder is an element, it as an item within the feed.  The file itself is the enclosure. The metadata of the file -- its name, its modified date and so on -- are a standard schema that represents the item.  Then there's a news feed that you'll see on the right hand side if you open up one of these folders, that's another feed, and each of the entries in there is another item, and so on. 
&lt;/p&gt;
&lt;p&gt;
We expect that developers will develop feeds that suit the needs of their specific application, and we deal transparently with the synchronization of those elements. The user interface offers a very simple consistent way to help users manage conflicts -- if the application says that the user should be the one to deal those conflicts.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;:  There are a lot of interesting degrees of freedom here. My bank's website has a RESTful interface to this stuff, but so does my mesh client.  In fact, I think people will be surprised and delighted to discover that you can hit the localhost with REST calls -- and that's putting stuff in, as well as getting stuff out.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;: Right. We made a decision, from an API perspective, that developers would prefer to learn one way of dealing with the mesh, and that the tooling would be easier if we had one way of dealing with the mesh. So the web version of Live Mesh, what's running up in the web, and what's running on the client, are symmetrical and the same code.  So on localhost, in a secure way, an application uses REST calls to invoke -- we call it MOE, the mesh operating environment -- or it calls cloud MOE to do what it needs to do.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;:  I think I can figure that out. [laughs]
&lt;/p&gt;
&lt;p&gt;
So the synchronization piece is interesting.  You've obviously been around this track a few times before. This time around, how is it different, how has it evolved from things you did in Notes or in Groove?
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  Well one thing that's different is that I didn't build this software.  I sponsored it, I had a good degree of input.  But a very talented team, with talented leadership, formed and rose to the task. I sponsored it, and there is certainly a DNA trail you can follow from Plato at University of Illinois to Lotus Notes to Groove and now into Live Mesh. And I was fortunate to have the opportunity to interact heavily with that team, when I had the time to do so, which was right after I came on board, when I was still in the CTO role.  But they took these base concepts and really ran with them, and developed it into a much richer thing than I could imagine.  
&lt;/p&gt;
&lt;p&gt;
But the DNA elements are the basic sync model, the basic interaction model. The biggest difference between Groove and Notes was that Groove embraced the concept of ad-hoc interaction much more in terms of inviting people into a shared environment.  So those invitation models are essentially borrowed from Groove into Live Mesh. So if you are a Groove user, you will feel very comfortable with that model in dealing with Live Mesh. 
&lt;/p&gt;
I hope people will be very pleasantly surprised with Live Mesh in terms of how it feels like there is almost nothing there.  It's very simple, even though it's complex under the hood, in order to actually accomplish this in a high-scale way and in a performant way and in a way that works across firewalls and home NATs and double NATs and things like that.  It's got very few knobs to turns and exposes itself in a fairly succinct way to the user.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;:  So, Mesh.com is the place to go to check it out but where do the developers find the SDK and everything they they need to know to actually work with it?
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;:  We're bringing out the Live Mesh software right now because it's a preview, we need to begin getting user feedback, we need to begin testing the scale of the back-end.  You can architect and plan these things but you can't actually just light them up at hundreds of millions of users overnight.  So there's a progressive rollout that begins today. What you won't find on mesh.com is the developer kit. We're  beginning a series of systems design reviews with smaller sets -- but increasing sets -- of developers over the course of the summer. The official rollout of the dev platform, and broad availability of the dev platform, would be at our PDC, our professional developer conference, this fall.  So as a user, look at Live Mesh now. As a developer, stay tuned, look at the screencasts that we've done, they'll show what we can do from an application perspective, but really, come to the PDC, go to the PDC web site when it happens and play with it. Both from the perspective of extending a rich application to the web and to other devices, and also extending a website out to take advantage of the power of Windows.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;JU&lt;/b&gt;: This is been extremely useful. Thanks, we appreciate it.
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;RO&lt;/b&gt;: It's been fun, thanks.
&lt;/p&gt;&lt;img src="http://channel9.msdn.com/489750/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Ray-Ozzie-introduces-Live-Mesh/</comments><link>http://channel9.msdn.com/posts/JonUdell/Ray-Ozzie-introduces-Live-Mesh/</link><pubDate>Wed, 23 Apr 2008 09:02:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/ch9/0/RayOzzieLiveMesh_ch9.wma</guid><evnet:views>404</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489750/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In this audio version of a &lt;a href="http://channel9.msdn.com/showpost.aspx?postid=399578"&gt;Channel 9 video&lt;/a&gt;, Ray Ozzie discusses his role as Microsoft's chief software architect, and the role of Live Mesh as one aspect of an emerging Internet-oriented platform.</evnet:previewtext><media:thumbnail url="http://channel9.msdn.com/Link/69d6e055-0380-4c15-b897-30d7213a79d5/" height="240" width="320" /><media:thumbnail url="http://channel9.msdn.com/Link/70e4b624-1012-42d1-949f-0766affe328c/" height="64" width="85" /><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/ch9/0/RayOzzieLiveMesh_ch9.mp3" expression="full" duration="2190" fileSize="17439973" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/ch9/0/RayOzzieLiveMesh_ch9.wma" expression="full" duration="2190" fileSize="17638541" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/ch9/0/RayOzzieLiveMesh_ch9.wma" length="17638541" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Ray-Ozzie-introduces-Live-Mesh/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489750/Trackback.aspx</trackback:ping><category>LiveMesh</category></item><item><title>Word for scientific publishing</title><description>&lt;p&gt;Pablo Fernicola is a group manager at Microsoft.  He runs a project focused on delivering tools and services for scientific and technical publishing, with a particular interest on the  transition from print to electronic and web based content, and its implications for collaboration, search, and content discovery in the future.&lt;br /&gt;
&lt;br /&gt;
In this interview, Pablo explains how a new add-in for Word, now available as a technical preview, helps authors and publishers of scientific articles work more effectively with one another, and with online archives like PubMed Central. &lt;/p&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;
            &lt;div&gt;&lt;img alt="" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/word-science-publishing/fernicola.jpg" /&gt; &lt;/div&gt;
            &lt;div&gt;&lt;b&gt;Pablo Fernicola&lt;/b&gt;&lt;/div&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;hr /&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;
            &lt;p&gt;&lt;b&gt;Links&lt;/b&gt;&lt;/p&gt;
            &lt;p&gt;&lt;a href="http://www.microsoft.com/mscorp/tc/scholarly-publishing.mspx"&gt;Technical Computing @ Microsoft - Scholarly Publishing&lt;/a&gt;&lt;/p&gt;
            &lt;p&gt;&lt;a href="http://www.microsoft.com/downloads/details.aspx?FamilyID=09C55527-0759-4D6D-AE02-51E90131997E&amp;displaylang=en"&gt;Download details for the Article Authoring Add-in&lt;/a&gt; &lt;/p&gt;
            &lt;p&gt;Pablo Fernicola's blog: &lt;a href="http://blogs.msdn.com/exscientia/"&gt;ex Scientia&lt;/a&gt;&lt;/p&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Hi Pablo, thanks for joining us to talk about a new Word add-in for authors of scientific journal articles. It's an interesting story about applying the XML capabilities of Office, and also about the evolution of journal publishing. How did this project get started? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; It's an incubation project. Three people had an idea: Jean Paoli, an XML pioneer, Jim Gray... &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Oh really? I didn't know he had been involved. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Yes, he and Jean really pushed to get this started, and they both recruited me for this project. It's been a little over a year since Jim disappeared, and that was a big blow, considering his key role. &lt;/p&gt;
&lt;p&gt;And third key person is Tony Hey. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; We should explain that Tony runs what's called the technical computing initiative, and is very involved in figuring out how Microsoft can help various people in the scientific community address computing and information management challenges. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Right. Scientific authors in many disciplines use Word to write articles. We looked into how to simplify the workflow, streamline the process, and lower the cost. And not just for the authors, but also for the journal publishers. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; It's been true for a long time in publishing, and not just scientific publishing, that there have been real challenges getting that Word content converted into the kinds of long-term formats we need: XML that's richly decorated with metadata. &lt;/p&gt;
&lt;p&gt;Publishers have tended to use strategies that involve giving people templates that try to use styles to control what's in the document. But since Word 2003, and especially since Word 2007, there have been a set of XML capabilities which have made possible a much more robust approach. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; That's right. Before Word 2003, styles were the best you could do. And people got quite far by relying on them. But they were very fragile. When you copied and pasted, styles would bleed across. It was hard to disentangle that when you converted the file. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; That's part of the problem. And part of is that, along with the content itself, there's a process involving the metadata, and that process is divided between the author and the journal publisher. It's a shared responsibility, and you need an information management system that embraces that division of labor. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Also: What kind of user interface do you present to these different groups? There are really three groups. First the authors, who are subject-matter experts but don't know anything about the publishing process, and shouldn't have to know. &lt;/p&gt;
&lt;p&gt;Second, the journal editors. They're also subject-matter experts, but they also know about the structure of the journal, and about the metadata they need to apply &lt;/p&gt;
&lt;p&gt;And third, you have companies and vendors who do backend tools and services, as well as the folks who work on the electronic archives. With the move from print to electronic journals, the role of the archive becomes very significant. Either the journals have their own repositories, or you have centralized repositories at university libraries or larger institutions, for example the National Library of Medicine with PubMed Central, or Cornell with Arxiv.org. &lt;/p&gt;
&lt;p&gt;That group is very technical in terms of understanding file formats, elements, and properties. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; But even those folks shouldn't necessarily need to master all of that. They'd rather spend their time on math and physics, not the minutia of XML publishing. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; That's right. The way the pipeline is set up today, you start with a Word document, and then at a certain point you convert to XML, and from that point on, all the editing happens in an XML editor. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; So in biology and medicine, the format defined by the National Library of Medicine, and the one you're supporting in this Word add-in, is called the NLM DTD. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Yes. It's not only used by PubMed Central, but also a lot of the commercial publishers are using it for their archival format. And we're also seeing it used by publishers in other disciplines, for example law and social science. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Really? It's general enough for that? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; It is fairly general, and I'm really impressed by how the community related to scientific, technical, and medical publishing is not reinventing the wheel, but instead leveraging something that's in common use. &lt;/p&gt;
&lt;p&gt;A significant point is that the format usually does not encode any presentation elements. It's all about the semantics and the metadata, not about what font or background color. As you try to preserve data for the long term, for centuries from now, the presentation is not relevant, it's the content that matters. You can always generate a presentation from it. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; So as we see in the accompanying screencast, you've created an add-in that presents editing enhancements both for authors and for editors. The interface for the author helps that person fill in the template and also apply those metadata elements which are appropriate for the author to apply. Then there's a separate interface for the editor. Explain a bit about how this can change the workflow. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; If you start from the author side, a key premise was requiring less effort to produce a valid document. You want to avoid having the author round-trip with the editor, back and forth, because they didn't fill in all the required information. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; And that happens a lot? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Yes. And it's not just failure to provide the required information. We want to make it easier to provide the correct information. Consider co-authors. You'll likely work with the same ones over and over. You want to avoid having to repetitively enter that information, and avoid having errors creep in. Remember: As we move to electronic publishing, search becomes key. It's the main way people will find articles. To have good search results, you need to know the information in the articles is good. If the last name of the author is misspelled, it's harder to find all the papers from that author. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; In terms of the consistency of author information, you can help with this Word add-in by normalizing the metadata editing process, but there still has to be a reliable disambiguated set of author names which are managed by the publishers, and ideally by a federation of publishers, and ultimately even more broadly than that. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Correct. If we look down the road, we see something like a global directory, but we're not there yet. We have to build up to that. When you look at the add-in, we're taking small steps that will get us to at least a better baseline than we have today. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Or, given that the world is moving to that baseline anyway, will help make it quicker and easier to get there. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; That's right. If we think of the authors, the key thing is to provide a very simple interface. As we consider features, if they look complicated we'll drop them. One of the prevailing rules is: Don't duplicate Word UI. If there's a way to do tables or equations or reference lists in the Word UI, we'll use those. We don't want to provide a lot of new UI for the authors to learn. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; What I find interesting, here, from a workflow perspective, is how people in different roles are touching different pieces of data and metadata. Historically that's been a one-way process. Once the article is converted into the NLM format, it's typically not available to go back to the author for editing in the original context. So the person at the journal has to be responsible for round-tripping change requests. &lt;/p&gt;
&lt;p&gt;Similarly with the editing of the metadata. The author might want to make some changes, the journal publisher might want to make some changes, and those things tend to happen in disparate environments. What this is showing is what has always been the promise of robust XML editing on the desktop. You can bring all these chores into a common environment. The unit of workflow, the document, is something that can flow to different people in different contexts, and be modified in different ways, but it hangs together as it moves through the process. &lt;/p&gt;
&lt;p&gt;That's a big deal, and it goes far beyond the specific domain of scientific and technical publishing. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Right. And in addition to keeping all the data together and providing a simple interface, publishers have told us that as they move to electronic-first, they expect the cycle times to shrink. With the current disconnected tools and formats, that's hard to achieve. If you want to make a quick revision and send it to the journal, it may be too late because they've started the process of conversion, and once that starts there's no stopping it. &lt;/p&gt;
&lt;p&gt;And to your point about other domains, people have told us they want to use this for things like grant requests as well, moving away from article content to other kinds of content that can benefit from the structure and validation. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; The problem is almost universal. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Yes, anytime you want to validate content, or preserve it for a long time, these capabilities are relevant. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; So 2003 was the first major deployment of XML capability for Office and for Word. We haven't yet seen as much use of that capability as I'd expected. Why? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; The biggest challenge was that XML wasn't the default format. You had to have authors do special things to produce XML. Also, if you think of the NLM formats, they contain things that aren't part of normal Word content or UI. In Word 2003, extending the document content, or extending the UI, wasn't as easy as it has become in Word 2007. &lt;/p&gt;
&lt;p&gt;With Word 2007, you end up with a set of things, in a single installation, that bring all the enabling capabilities together at the same time and in the same place. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; So what you did have, in Word 2003, was user-defined schema, but you're saying that wasn't enough, and that the newer capability of including arbitrary chunks of XML is more flexible for this purpose? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Yeah. There's two parts to that. There's content within the document, so the ability to have new XML elements that are part of the document, and that's more robust and expressive in Word 2007's Open XML format. Then there's the ability to have other XML data packaged within the file. Custom XML is what that's usually called. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; And that's the method you're using for the journal metadata? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Right. And since this is all defined as part of the Open XML format, and since the packaging of the file follows the standard as well, developers can build their own tools to create metadata, access metadata, or even create the whole file, they can. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; So this is a first cut you're putting out for publishers to experiment with, and to help you refine the templates they'll deploy to authors? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; Yes, this is a technology preview for evaluation and feedback. The idea is that the publishers will create the templates themselves. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Who are you working with? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; We're talking to many different journals, publishers, and archives. Each constituency has a different set of interests and requirements. Journal editors care a lot about the templates, but folks at PubMed Central and Arxiv care more about how the metadata gets validated. &lt;/p&gt;
&lt;p&gt;We expect a beta shortly, and a 1.0 release by late summer. It'll be a free add-in for Word. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Well thanks Pablo. I fear that this will only seem interesting to the relatively small number of folks who have a direct interest in scientific, technical, and medical publishing. But I hope it will be apparent that it's much broader. You hinted at that when you mentioned that the NLM format, despite having been invented for the particular purposes of certain disciplines, is being taken up by people in legal and other disciplines. &lt;/p&gt;
&lt;p&gt;I'm excited about it because I care about publishing and metadata and robust information systems and open formats, and this brings all those things together. I'm glad to know that it's happening, and I'm glad you're working on it. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; It's really proving the value proposition of XML, and show how it's coming of age in a mainstream production environment. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Yep. For those of us who've been thinking about this for a long time, there's been a tendency to get frustrated and feel like it'll never happen. But it just takes a while for things like this to make their way into the mainstream, and this is a great example of that. &lt;/p&gt;
&lt;p&gt;Well, thanks Pablo! &lt;/p&gt;
&lt;p&gt;&lt;b&gt;PF:&lt;/b&gt; OK, thanks! &lt;/p&gt;&lt;img src="http://channel9.msdn.com/489749/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Word-for-scientific-publishing/</comments><link>http://channel9.msdn.com/posts/JonUdell/Word-for-scientific-publishing/</link><pubDate>Thu, 17 Apr 2008 15:30:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/word-science-publishing/fernicola.wma</guid><evnet:views>151</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489749/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;br /&gt;
&lt;br /&gt;
Pablo Fernicola is a group manager at Microsoft.  He runs a project focused on delivering tools and services for scientific and technical publishing, with a particular interest on the  transition from print to electronic and web based content, and its implications for collaboration, search, and content discovery in the future.&lt;br /&gt;
&lt;br /&gt;
In this interview he explains how a new Word add-in, now available as a technical preview, helps authors and publishers of scientific articles work more effectively with one another, and with online archives like PubMed Central.</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/word-science-publishing/fernicola.mp3" expression="full" duration="1780" fileSize="14245440" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/word-science-publishing/fernicola.wma" expression="full" duration="1780" fileSize="14420011" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/word-science-publishing/fernicola.wma" length="14420011" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Word-for-scientific-publishing/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489749/Trackback.aspx</trackback:ping><category>office xml</category><category>Publishing</category><category>science</category><category>Word</category></item><item><title>Pablo Fernicola demonstrates the Word add-in for scientific authors</title><description>&lt;img src="http://channel9.msdn.com/Link/6ac56c1b-026a-49f3-9108-fbfd6ee5da7a/" border="0" /&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In this screencast, Pablo Fernicola demonstrates the technical preview of a new scientific publishing add-in for Word. The add-in enables reading and writing of XML-based documents in the archival format used by the National Library of Medicine. &lt;br /&gt;&lt;img src="http://channel9.msdn.com/489748/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Pablo-Fernicola-demonstrates-the-Word-add-in-for-scientific-authors/</comments><link>http://channel9.msdn.com/posts/JonUdell/Pablo-Fernicola-demonstrates-the-Word-add-in-for-scientific-authors/</link><pubDate>Thu, 17 Apr 2008 15:29:00 GMT</pubDate><guid isPermaLink="false">http://channel9.msdn.com/posts/JonUdell/Pablo-Fernicola-demonstrates-the-Word-add-in-for-scientific-authors/</guid><evnet:views>25</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489748/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>







In this screencast, Pablo Fernicola demonstrates the technical preview of a new scientific publishing add-in for Word. The add-in enables reading and writing of XML-based documents in the archival format used by the National Library of Medicine. </evnet:previewtext><media:thumbnail url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/word-science-publishing/screencast.jpg" height="240" width="320" /><media:thumbnail url="http://channel9.msdn.com/Link/6ac56c1b-026a-49f3-9108-fbfd6ee5da7a/" height="64" width="85" /><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/word-science-publishing/fernicola.wmv" expression="full" duration="618" fileSize="8537475" type="video/x-ms-wmv" medium="video" /><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/word-science-publishing/fernicola.wmv" expression="full" duration="618" fileSize="8537475" type="video/x-ms-wmv" medium="video" /></media:group><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Pablo-Fernicola-demonstrates-the-Word-add-in-for-scientific-authors/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489748/Trackback.aspx</trackback:ping><category>office xml</category><category>Publishing</category><category>science</category><category>Word</category></item><item><title>Making sense of C02 data: A Microsoft/Berkeley collaboration</title><description>&lt;p&gt;&lt;i&gt;
In this podcast, MSR researcher Catharine van Ingen and Berkeley micrometeorologist Dennis Baldocchi talk with Jon Udell about their collaboration on &lt;a href="http://fluxdata.org"&gt;www.fluxdata.org&lt;/a&gt;, a SharePoint portal to a scientific data server. The server contains carbon-dioxide flux data gathered from a worldwide network of sensors, and provides SQL Server data cubes that help scientists collaboratively make sense of the data.
&lt;/i&gt;
&lt;/p&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;img width="250" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/fluxnet/baldocchi.jpg"&gt;
&lt;div&gt;
&lt;b&gt;Dennis Baldocchi&lt;/b&gt; is a professor of biometeorology at Berkeley. His research focuses on the physical, biological, and chemical processes that control trace gas and energy exchange between vegetation and the atmosphere. He also studies the micrometeorology of plant canopies.
&lt;/div&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;hr /&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;div&gt;
&lt;img width="250" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/fluxnet/catharine-van-ingen.jpg"&gt;
&lt;b&gt;Catharine van Ingen&lt;/b&gt;, a partner architect with with Microsoft Research in San Francisco, does e-science research exploring how database technologies can help change collaborative research in the earth sciences. She collaborates with carbon climate researchers and hydrologists. 
&lt;/div&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;p&gt;&lt;hr /&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Links&lt;/b&gt;&lt;/p&gt;
&lt;div&gt;&lt;a href="http://www.fluxdata.org"&gt;Fluxnet website&lt;/a&gt;&lt;/div&gt;
&lt;div&gt;&lt;a href="http://www.microsoft.com/mscorp/tc/carbon-climate-feature.mspx"&gt;MSR news article&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; Dennis, you're someone who's pulling together a worldwide network of CO2 monitoring stations. Can you briefly explain how these devices work?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; Sure. Let me give you a bit of history. Back in the late 1950s, &lt;a href="http://en.wikipedia.org/wiki/Charles_David_Keeling"&gt;David Keeling&lt;/a&gt; made some of the first measurements of carbon dioxide concentration -- on Mauna Loa in Hawaii, in the Arctic, very remote locations. They saw an increase in the C02 concentration in winter, and a decrease in summer. The increase is due to respiration in the biosphere, the decrease is due to photosynthesis. And on top of this they saw a trend due to fossil fuel combustion and logging of tropical forests. 
&lt;/p&gt;
&lt;p&gt;
These measurements were just C02 concentrations. As atmospheric scientists, we know that changes in the atmospheric concentration are due to fluxes. We measure actual fluxes: moles of carbon dioxide, per meter squared, per second, between the atmosphere and the biosphere.
&lt;/p&gt;
&lt;p&gt;
We do it with a combination of sensors. One is a three-dimensional sonic anemometer, which measures up-and-down and lateral-and-longitudinal motions of the air, ten times a second. And simultaneously with new sensors we measure instantaneous change in CO2 concentration.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; So it's a combination of sensing wind speed and sensing atmospheric gas.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; Absolutely. We measure a covariance between the two, and theoretically that's related to the flux density.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; And this population of sensors has been growing for 15 or more years?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; Yeah, my old lab in Oak Ridge, Tennessee made some of the first sensors we were using in the early 90s. Around then a company called &lt;a href="http://www.licor.com/"&gt;Licor&lt;/a&gt; started making a sensor that's about 15 centimeters long and shoots an infrared beam from source to detector. The air can blow through this sensor, and it's low power, doesn't need pumps, so it can be deployed in the middle of nowhere. Many of us run with solar power, so we have a PC that pulls an amp, then the sensor pulls another amp, so for two amps we can run a flux system.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; As Catharine points out, there's a long tradition of large-scale collaboration in some scientific disciplines, but it's relatively new in other areas, and it sounds like this is one of those.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; Yeah. I was a grad student in the 80s and I remember my professor having a desk full of data. People would knock on the the door wanting to borrow it, and there was always some reluctance, it was really a single-investigator culture at the time.
&lt;/p&gt;
&lt;p&gt;
In many ways I credit our Italian colleagues, they were really gregarious and good at hosting wonderful workshops that started bringing people together. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; So Catharine, how did Microsoft get involved in building out the scientific data server that supports this project?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; It was serendipity. We had met folks at the &lt;a href="http://www-esd.lbl.gov/BWC/"&gt;Berkeley Water Center&lt;/a&gt; two ways. First through Jim Gray's interest in e-science and database applications. Second, one of the current heads of the Berkeley Water Center is an old friend of mine from grad school, Jim Hunt. We were talking about doing a hydrology project, then somehow my colleague at BWC on the computing side, Deb Agarwal, ran into Dennis, and we started talking.
&lt;/p&gt;
&lt;p&gt;
Dennis fit all of the criteria for how I like to engage with scientists. He was desperate, he had a problem that he didn't know how to solve, and that was important, because it meant he was willing to talk to us and teach us things.
&lt;/p&gt;
&lt;p&gt;
Also he had enough data to make things interesting for us. It's not petabytes, but we're talking about the hundred-gigabyte range, and the dataset is extremely diverse. I find it fascinating from an informatics point of view because it's a true scientific mashup to do the data analysis. You're taking the flux data that Dennis just described, as well as a lot of site properties, and other things from the literature, and trying to bring it all together.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; There's a whole range of what you folks call ancillary data, which describes soil and vegetation and other aspects of the environment.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; To give you an example, the meteorological data, from a database point of view, is fairly simple and regular. Our loggers give us half-hour data, so you get what's essentially an Excel spreadsheet. The rows are timestamped for each half-hour, and the columns are temperature, flux of water, solar energy, and so on. But it gets complex when you weave in the ancillary data. For example, you need to know the population of leaves that control these fluxes. You may measure that in a half-dozen spots, a half-dozen times per year. Then you need to understand leaf photosythesis, and that's another set of measurements, and then soil texture, carbon, and water absorption, and all these measurements are at different depths, different times, it gets really complex.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; Another interesting aspect, from our side, is handling time. We all think time is linear...
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; [laughs] Not according to Einstein...
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; [laughs] ... well ... so, since we're dealing with plant information, plants photosynthesize during the day. So rather than using wall-clock time, using the plants to tell us about day or night was really fascinating. In effect we're deriving a time window based on the time series data themselves, and for informatics folks, this was more fun than a barrel of monkeys. We've generalized the concept now, and applied it to a couple of other disciplines. Handling time has turned out to be one of the biggest areas of learning.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; So what is FluxNet, actually, and how does the data get into the scientific server that you've built?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; It started at a workshop we held in Italy in 1995. From that, regional networks started blossoming. First off the ground was the EuroFlux network, then AmeriFlux in about 1997, then over time the Asians, the Canadians. NASA funded us for two cycles, and then things dried up as they decided to go to the moon and to Mars. Most recently we've been funded by NSF, which is funding a whole bunch of ecological networks. On the side, there's been funding to Oak Ridge National Lab, through NASA, to maintain the data acquisition and archive system. And then Deb and Catharine joined in to build value-added products through this FluxData project. 
&lt;/p&gt;
&lt;p&gt;
Sometimes I think we're like Tom Sawyer, we've got this fence to paint and all these people are helping us paint it.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; Or like stone soup.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; It is like stone soup. From an informatics point of view, the way we think about it is that the data starts with tower owners -- and Dennis is a tower owner as well as a project overseer -- and flows to one of the network repositories, or directly to Oak Ridge where the data is stored.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; OK, so your site, www.fluxdata.org, is not the repository, it's for analysis...
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; Yes. There are data archive centers, funded primarily by NASA, where you can contribute data, and where data is stored. The challenge for the scientist is to get from the raw data to the science, it's a classic last-mile problem. So the data flows from the repositories to the folks in Europe who are doing gap-filling and uniform processing, and it flows back to Oak Ridge for long-term storage, and it flows to us.
&lt;/p&gt;
&lt;p&gt;
We then make it available to researchers to download, and we provide the value-added summary products. So we're not at the front end gathering data, and we're not the archive, we're in the middle, solving that last-mile problem.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; Part of that solution is to put the stuff into data cubes. Dennis wrote somewhere that while these have been used in financial analysis for a long time, their application to scientific analysis is new. It might surprise some people to learn that this way of looking at data isn't common in the scientific world.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; It actually isn't. OLAP databases, data cubes, have been around for a long time. I think I first saw one in the early 90s. But that was really commercial data, it was about finding how to make coupons for Oreos and milk. Scientific data is different in a couple of respects. First, it's much more dense. You tend not to always buy Oreos and milk together, but Dennis always reports CO2 flux, temperature, and precipitation together. The other difference is that a lot of the analysis for commercial data is not at the leaf nodes, it's about annual sales. Whereas a lot of science is actually at the leaf nodes, it's about looking at statistical variation in the half-hourly data.
&lt;/p&gt;
&lt;p&gt;
So we end up building different-shaped cubes. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; And let me add that we'll present this data with gaps, for several reasons. One is that if there's a thunderstorm, it might cause the instrument to malfunction. Another is that we have to comply with meteorological steady conditions -- for example, steady winds. So we apply a lot of quality assurance to the data set, and that produces gaps, but any user of the data wants a continuous record. So we need to find ways to fill those gaps. 
&lt;/p&gt;
&lt;p&gt;
We also want to partition the fluxes, so we can understand mechanisms. We measure the net ecosystem exchange, but there's a component due to photosynthesis and a component due to respiration. By separating out day and night data we can derive these components, so there's all this value added to the data from the archive.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; So I looked at some of your pivot tables, for example on sites by vegetation -- how are those being used?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; To do cross-site analysis. For example, we're interested in how length of growing season may affect net carbon exchange. When I did this analysis before I met Catharine, I had to open a bunch of spreadsheets and cut and paste, cut and paste. With the cubes, you press a button and the data's there. It really allows you to do a lot of quick what-if questions, and be creative. It makes our work quicker and easier.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; We're also doing a fair amount of sorting. You can sort along vegetation types, to see the difference between croplands and grasslands. We also know each of the sites that is a boreal forest, so you can look at just those, or just tropical forests. If the database has 900 site-years, you can select just the 200 that you need for a piece of analysis.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; Is it fair to say that until this was brought together it wasn't possible to do this?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; It was possible, but just really tedious.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; Back when the network was small, we did a workshop in 2000, and we had about 100 site-years of data from 30 sites. It was easy to be clunky. But now we have 900 site-years from 400 sites, and you just can't use the old methods. We have to go modern. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; What kinds of collaboration effects are you seeing? You've written that it's a big challenge to motivate scientists to contribute the ancillary data in a standard way. Getting the stuff in front of people like this, in a common presentation with explanations about what all the variables mean, and how to report them, should help get everybody onto the same page.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; I see a couple of things. First, we're starting to hear from individual tower owners asking us questions, and telling us what's wrong. "I'm sorry, my site isn't really at that lat/lon." Or: "My leaf index is really this."
&lt;/p&gt;
&lt;p&gt;
They see their data being used in papers: we're hosting access for about 60 paper-writing teams. As the papers come to fruition, we're actually tracking what sites they're using, so it's possible to go in and find out who's using your data.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; It's motivating. I know my post-doc is so excited when she finds out people are using this data. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; That explains why you have an update feature on the site?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; Absolutely. We know there are corrections that need to be made. Treating it as a living, breathing data set, and being able to respond in an organized way to changes...
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; As more eyes look at it, they can help us fix it. Especially our own data. You look at it and don't see the problem, but when someone tries to use it...oops. In fact we found a problem with our solar heat flux recently. We were doing the correct calculations from 2000 to 2003, then we changed algorithms, and the staff changed, and all of a sudden there was a glitch in how the data were being processed. Finally some scientist from UCLA wanted to use the data, and he plotted it up, and found the problem. So now we're correcting that. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; One of the things that happens when you plot data over time is that you can see any errors in time reporting. One site was off by a couple of months. The data looked fine when you plotted just that site. But if you plot it by nearby sites, suddenly you see the problem. That's the kind of processing -- bringing the data into focus -- that we're engaged in right now.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; So you've got the data online, and tools for viewing and updating the data, but there's also a conversational infrastructure. You have a blog, there are places for people to add comments and have discussions, and all of that is kept together with the data. Catharine, you've said that the role of data curation in science is emerging, and will be key as we increasingly see these mega collaborations with hundreds or even thousands of people working on the same data. You need an environment in which those conversations can be centralized in the same way the data is centralized.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; There's also almost a traffic-cop role too, just to avoid redundant efforts. There are several obvious ideas, and multiple groups may want to pursue them. In the long run it's a waste of effort if people are doing the same redundant analysis, and only one paper may get published. If we can get these people to talk to each other, and interact, that's critical.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; As Catharine puts it, investing the same effort in publishing data as you would in writing a paper is something that's not yet socialized. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; No, it's not. We see again and again how difficult it is to put the data in a box and tie a bow around it, so people can reuse it. It's very hard, but very important, long-term, for a lot of these environmental problems.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; So Catharine, by marking these data sets and giving them some kind of provenance, is this a way scientists can get credit for the work?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; Well, the challenge isn't only enabling that, but also teaching the funding agencies that it's just as important.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; Exactly. I've talked to Timo Hannay about this -- he's the guy who runs the web stuff for Nature Publishing -- and this is a huge interest of his. Science is an enterprise that runs on people getting credit for publishing papers, not data. I gather that often papers are published as a thin gloss on a data set, just to get the data out there. There hasn't been a model for publishing the data itself. The fact that the data from somebody's individual tower can be traced back, and then traced through its use in follow-on papers -- that's huge. Your post-doc can not only get excited about other people using her data, she can get credit for their citations of it.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; So, the climate effect of C02 is obviously a hot topic. What have we actually learned at this point?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; One paper used this network in combination with remote sensing to see how carbon exchange across Europe responded to the drought and heat wave in 2003. So here was this network poised to measure how the whole biosphere responded to this climate assault. 
&lt;/p&gt;
&lt;p&gt;
The network has also been successful with what we call emergent scale processes. One that came out strongly is that plant canopies respond to light more efficiently if the light is diffuse, as opposed to when there are clear skies. That's a process we haven't seen before.
&lt;/p&gt;
&lt;p&gt;
Another thing we found, because we have continuous records, is that if there's a summer rain event, microbes turn on immediately and produce huge amounts of respiration that we never envisioned before. Scientists in the past would miss these extreme events, but by having continuous measurements we can see how the system responds. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; But you wouldn't argue for long-term trends in the 15 or so years of data you've collected?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; If there are long-term trends, they seem more related to ecosystem dynamics. Many of the forests under study were disturbed at the turn of the century, so they're going through that natural cycle of growth, maturity, and decay. Those ecological features lay on top of any potential climate trends.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; So it's more about having an infrastructure in place that allows us to have the data in hand, and then make some predictions?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; Yes. Now in fact, one of the things we are seeing is a change in the length of the growing season. As things have gotten warmer, the spring comes earlier, and it's really affecting carbon uptake in the citrus forests. But the big unknown is that if you have an earlier spring you might also get a summer drought, so you have an increase in carbon in the spring, and  a decrease in summer, and the two factors may cancel out. But with our measurements we can see the mechanisms, we can understand and parse out what's happening and why. Whereas in the past, scientists would cut down trees and get tree rings and take one integrated snapshot for the whole year. But they wouldn't understand why, because those tree rings were also affected by drought and temperature and ozone and elevated C02 and other issues.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; It's really a great time to be doing this stuff, because you're at the juxtaposition of social need, scientific need, and the availability of cheap technology.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; And our NSF grant encourages to do outreach, so this is a great opportunity to do that.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; Jim Gray always used to point out that the post-docs are the ones in any collaboration who most embrace new technology, and move the entire collaboration forward. Knowing the guys over in Europe that's certainly true, and you can see it happening with your own post-docs, Dennis.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; So how are these cubes getting built, Catharine? What was the collaboration between you and the scientists?
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; We're lucky to be starting with a data set that is very well processed. As to building the rest, Dennis gave us, gosh, I looked at 300 hundred of his graphs. I also got a similar collection from two of his other colleagues. I went through all the graphs and papers to try to understand how the data is manipulated and displayed. 
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; That's a good idea. I didn't realize you did that.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; Oh yeah. [laughs]
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; That would be helpful, because you see the kinds of products we're trying to create from these databases.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; Absolutely. I started by classifying the graphs into time-series graphs, scatterplots, and then everything else. Then I waded through how everything was sorted, searched, filtered, trying to figure out how to organize the data to enable that class of graphs.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB&lt;/b&gt; So Catharine, there are a bunch of graphs I'd like to replot with this new database.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; Well Dennis, you and I should have lunch and we should figure out how to rip out a bunch of graphs. 
&lt;/p&gt;
&lt;p&gt;
So, along the way we realized that scientists will often make 50 graphs, through away 48, and keep two. The ability to make a lot of graphs rapidly and simply usually requires some kind of scripting, and that's where you start leaving Excel and going into MATLAB or another scientific analysis tool.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; Yeah, I'm using MATLAB a lot nowadays, and I'm seeing things I never saw before. I like having the script files because it gives me some history of what I was looking at.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; That's why we decided to connect MATLAB to the cube, so you can browse the reports we make in Excel, or go directly through MATLAB. Again, it's solving that last-mile gap to the scientist's house.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;JU:&lt;/b&gt; Well this has been great, thanks!
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;DB:&lt;/b&gt; Yeah, thanks. Catharine, we should get together and talk about some graphs.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;CvI:&lt;/b&gt; Thanks Jon. And thanks Dennis. Are you in your office? I'll call you later this afternoon.
&lt;/p&gt;&lt;img src="http://channel9.msdn.com/489747/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Making-sense-of-C02-data/</comments><link>http://channel9.msdn.com/posts/JonUdell/Making-sense-of-C02-data/</link><pubDate>Thu, 03 Apr 2008 15:26:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/fluxnet/fluxnet.wma</guid><evnet:views>227</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489747/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;p&gt;&lt;em&gt;In this podcast, MSR researcher Catharine van Ingen and Berkeley micrometeorologist Dennis Baldocchi talk with Jon Udell about their collaboration on &lt;a href="http://fluxdata.org"&gt;www.fluxdata.org&lt;/a&gt;, a SharePoint portal to a scientific data server. The server contains carbon-dioxide flux data gathered from a worldwide network of sensors, and provides SQL Server data cubes that help scientists collaboratively make sense of the data. &lt;/em&gt;&lt;/p&gt;
&lt;em&gt;&lt;/em&gt;</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/fluxnet/fluxnet.mp3" expression="full" duration="2400" fileSize="19524480" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/fluxnet/fluxnet.wma" expression="full" duration="2400" fileSize="19750759" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/fluxnet/fluxnet.wma" length="19750759" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Making-sense-of-C02-data/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489747/Trackback.aspx</trackback:ping><category>Collaboration</category><category>data curation</category><category>science</category></item><item><title>Cluster computing for the classroom</title><description>&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;&lt;img width="280" alt="" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hpclabs/kyril_faenov.jpg" /&gt; &lt;br /&gt;
            &lt;b&gt;Kyril Faenov&lt;/b&gt; is the General Manager of the Windows HPC product unit. Before founding the HPC team in 2004, Kyril worked on a broad set of projects across Microsoft, including running the planning process for Windows Server 2008, co-founding a distributed systems project in the office of the CTO, and developing scale-out technology in Windows 2000. Kyril joined Microsoft in 1998 as the result of acquisition of Valence Research, an Internet server clustering startup he co-founded and grew to profitability by securing MSN, Microsoft.com and some of the world's other largest web sites as its clients. &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;hr /&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;img width="280" alt="" src="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hpclabs/rich_ciapala.jpg" /&gt; &lt;br /&gt;
            &lt;b&gt;Rich Ciapala&lt;/b&gt; is a program manager in Microsoft HPC++ Labs, an incubation team within the Windows HPC Server product unit. Rich joined Microsoft in 1992 and has held a number of different positions in technical sales, Microsoft Consulting Services, the Windows Customer Advisory team and the Visual Studio product team. &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;hr /&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;
            &lt;p&gt;&lt;b&gt;Links&lt;/b&gt;&lt;/p&gt;
            &lt;div&gt;&lt;a href="http://labs.microsofthpc.net/compfin"&gt;Microsoft HPC++ CompFin Lab&lt;/a&gt;&lt;/div&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;h2&gt;Kyril Faenov and Rich Ciapala discuss a new HPC++ Labs project that enables students to run computation-intensive experiments involving large amounts of financial data. &lt;/h2&gt;
&lt;br /&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; What Rich just demoed, which we'll show in a screencast, is how a financial model can be deployed to a server that acts as a front-end to a compute cluster. It's a nice easy way for students to use a model developed by a professor, select a basket of securities, run a very intensive computation on them against large chunks of data, and get answers back in an Excel spreadsheet. The bottom line is that the students can run an experiment using a level of computing power that was never before so easily accessible. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; Yeah, because of the complexity involved in deploying systems like that, acquiring the data, and curating it, a lot of universities don't have this kind of infrastructure in place. So for a number of students who haven't done this before, this will make it available for the first time. For others who have, it will make it quite a bit easier. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Now these are not computer science students who are learning about high performance computing, and about writing programs for parallel machines, these are students who are learning about financial modeling, and this just makes a tool available to them that can accelerate that. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; Precisely. Most of our HPC customers are scientists, or engineers, or business analysts, not computer scientists. They're folks who use mathematics, statistics, differential equations ... sometimes not even math directly, but applications that encode these mathematical models to do research, or engineering, or risk modeling, or decision making. To them it's just a tool, and they want to use it in the way they use PCs today, as transparently and straightforwardly as possible. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; What's the situation today for most people? In the case of the covariance model Rich showed in the demo, if it weren't being done like that, how would it be done?a &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; You can do it in Excel, or MATLAB, or SAS, on the workstation. So you'd acquire the data, and use your preferred tool ... &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; ... and wait a long time ... &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; ... and wait a long time. And if you want to do a significant amount of data -- like a year's worth, for a large number of stocks -- it might not even be possible at all. &lt;/p&gt;
&lt;p&gt;Or you might load it up into a server, but then you have to figure out how to write an application, how to deploy it out to the server, then figure out how to submit the data to the model, pull it back, integrate into the visual analytic process. &lt;/p&gt;
&lt;p&gt;This multi-step process is exactly what our HPC customers are running into. They're expressing the models and doing the design on the workstation, using any number of tools. They do the analysis of the results, and visualization, on the workstation. But large-scale computation runs somewhere else. It might be in their organization, it might be out on the Internet, but it's a very disjointed process. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; There are clusters out there in academia, and there are people doing these kinds of things, but the point is that hasn't been woven together yet. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; That's right. In 2004 the U.S. government published an assessment of U.S. competitiveness in high performance computing. The first recommendation was, and I'm quoting: &lt;/p&gt;
&lt;blockquote&gt;Make high performance computing easier to use. Emphasis should be placed on time to solution, the major metric of value. A common software environment that spans desktop to high-end systems will enhance productivity gains. &lt;/blockquote&gt;
&lt;p&gt;That's what we're starting to see in the HPC community. Not just getting the systems running as fast as possible, but figuring out how the workflow, the creative element of the scientific process, can be optimized. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; So, Rich and I talked about the particular model used in his demo is in a class called &lt;i&gt;parameter sweep&lt;/i&gt;, which he distinguishes from the more distributed and chatty kinds of applications. In this case, you can send a batch of data down to a node, it can think about it for a while then give back an answer, and there doesn't need to be much communication. Is that the optimal scenario for this architecture? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; Actually, it's optimized for a broad range of HPC applications. In fact, the major goal of the first release of the product, Compute Cluster 2003, was MPI-style [message passing interface] applications. There are a lot of these in engineering and in the environmental space. You're modeling some kind of physical process, and you build a mesh or grid that takes a large physical process or body, partitions it, does computations on local areas, but then has to frequently exchange data across the partitions. Think about a car crash simulation. You might partition the hood of the car into a lot of pieces, every one computed separately, but as the deformation is happening the forces need to be exchanged. Or weather modeling, where heat exchange happens across partitions. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; There's a high degree of data interdependence. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; Exactly. When you you have an interdependent problem, you use MPI for that. We worked with the team at Argonne National Labs that releases the open source reference implementation of MPI, and we've adopted that in our product, optimized the performance and security on Windows, and integrated it into the stack. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Right, I knew about the MPI layer in the cluster product. But it seems that the system we're looking at here, for professors to enable students to experiment with financial modeling -- that one is targeting the other class of application &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; Right. There is a large class of what we call embarrassingly parallel problems, a lot of statistical analysis falls into that category, and media rendering, where you have a lot of independent tasks. And that's what we have here, because every pair of instruments that needs to be compared is an indepdendent task. What you need to do is spray those tasks across a cluster. We have a solution that makes that much more approachable. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; So in this case, that entails mapping the input parameters to a set of work items. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; Correct. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; OK. And outside the financial domain, where else will this style be popular? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; We'll see this in a range of disciplines. This particular example uses data from an external source -- in this case, the stock market -- and it's looking for patterns of correlations between different signals. This paradigm is broadly applicable. If you think about, for example, clinical research, where you have data coming in from hundreds of patients, where the data includes many parameters about their health condition, and you're looking for disease markers or drug reactions -- you're doing correlation analysis among the diffeerent signals. &lt;/p&gt;
&lt;p&gt;Or you might have data coming in from sensors deployed in oil and gas pipelines for safety monitoring, or environmental sensors, everywhere you have instruments producing high volumes of data, where you need to find patterns in data, and optimize the scientific process of developing models that produce insight into the data. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Would you say that these embarassingly parallel problems are low-hanging fruit? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; Very much so. And there's another class, Monte Carlo simulation, a method used very effectively across a range of industries to statistically explore different scenarios, for risk analysis and predictive model. It's used in financial services, like insurance, but things like process management in factories can also use it, or logistics chains. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; So for the current example, give us a sense of what skill set is required of the professor in order to create the model and make it available to students. There's some .NET programming involved, right?a &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; Rich, do you want to take this? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;RC:&lt;/b&gt; Well, you pick your .NET language of choice, and your development environment, which may be Visual Studio. We're making the data available in terms of LINQ, so you need some understanding of that, although for the queries typical of these applications it's fairly basic. And in fact, since it's integrated into the language and you get things like syntax completion, it's probably easier than writing SQL. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; There's a framework provided, what does that include? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;RC:&lt;/b&gt; It does two things. First, it forces you to define the interface for your model in such a way that you can easily build, for example, an Excel front-end to send input and retrieve output. Second, it shows you exactly where you need to do the splitting of the tasks into work items, where you do the spraying of work items to the cluster, and where you put the code that does the covariance and correlation calculations. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; The professor focuses on writing the analytics parts, and doesn't have to worry about the fairly complex workflow skeleton that submits the data to the cluster, partitions the work, accessing the results, and then performing the final reduction. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; So can focus on creating the pivot table, or using MATLAB, which is where I'd rather be spending my time. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; Yes, in a domain you're expert in. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; So, who are the guinea pigs for this system? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;RC:&lt;/b&gt; Our first two are the University of Washington, which did the model we demonstrated, and the University of North Carolina in Charlotte. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Kyril, I know you have big ideas about where this can go. Why don't you paint the picture? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; When we started the HPC team at Microsft, we realized it's an actively evolving space. But Microsoft is fairly new to it. Without the benefit of 20 or 30 years of experience, we felt we needed to do something that would help us develop expertise and build up an understanding of not just the technology, but also the usage patterns. So we worked with, and funded, 10 universities worldwide, and that's been very helpful. &lt;/p&gt;
&lt;p&gt;We've also created an internal team whose mission is to do incubation. The goal of this team is threefold. First, to prototype and demonstrate the end-to-end solutions that our HPC customers will find beneficial, and what Rich has demonstrated is an example of that. &lt;/p&gt;
&lt;p&gt;Second, to help us explore the trend that we see as HPC becomes more and more data-driven. There's still the world where you run simulations, of car crashes or weather. But a lot of new applications are mining data for insight, and doing it in a computationally intensive way. That changes the formula for how HPC is used. In many cases it's becoming impractical to put clusters in customer locations, if you have to ship terabytes or petabytes of data around. &lt;span class="pullquote"&gt;Data repositories are starting to act like black holes, if you will, that are pulling computation towards them.&lt;/span&gt; &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; I'm sure that's true in the climate area... &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; Climate, biology, astronomy, geosciences, everywhere that you start accumulating tremendous data sets. We think there's going to be way that Microsoft can help customers optimize how these services are built, because there's no established architecture today. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Jim Gray was always talking about how it's becoming necessary to Fedex hard disks around the world because there's no other way to move the data to the computation. But instead you're proposing to move the computation to the data. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; That's right. We want to incubate a few of these high-value data-centric services, and demonstrate the best practices for doing that while providing free access to academic institutions. That'll help us understand what's involved in operating these services, and potentially we might imagine Microsoft running a few of them. &lt;/p&gt;
&lt;p&gt;Then the third goal for the incubation team is to flow the requirements for doing these things into software, so that customers can do this as easily as possible themselves. One of the challenges today is that there's a dichotomy between these very large-scale Internet services being built -- by Microsoft, Yahoo, Google, and others -- but they're in their own world. Customers can't take a slice of that infrastructure and deploy it in their environments. &lt;/p&gt;
&lt;p&gt;At the same time, we keep on building off-the-shelf software that people install on their infrastructure, and we're just now learning what it takes to run HPC services using that software. So we want to make sure there's a tight coupling between the team that builds the prototypes and runs the services, and the team that implements off-the-shelf software, such that we run our services using the products that we build. And at same time, we want to make it a turnkey operation for customers to stand up these services themselves. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; That's a key point, so let's underscore it. We're seeing the emergence of a small set of what I call intergalactic clusters, which are one-of-a-kind things, and they are not replicable. They do interesting and powerful things, but you can only do things with them on their terms. &lt;/p&gt;
&lt;p&gt;Your notion is that you want to maintain parity, and ensure that you can always replicate what's happening in the cloud if you need to. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; Exactly. For example we just talked about the gravitational pull of data. Imagine you have an astronomy site that accumulates a petabyte. You can try to put it on one of these intergalactic clusters, but that's maybe not what you want. Maybe the most optimal thing is for you to stand up a 1000-node cluster with each node having a terabyte of disk. We want to enable that. We want to be able to tell our customers: Here's how we run this large-scale data-driven HPC applications, and here's how, within a day or two, you can stand up one of these yourself. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; So you even see some potential consumer applications for this, don't you? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; Sure. Think about search. We can only find answers to questions that have already been answered. But imagine if your questions require novel insight to data. For example, Microsoft HealthVault is starting to accumulate a lot of health data. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; Right, so what are my cancer survival prospects given the specifics of my case, and in light of a large body of data about other people? &lt;/p&gt;
&lt;p&gt;&lt;b&gt;KF:&lt;/b&gt; Or help me do a predictive analysis on my risk of flood or hurricane damage, not for the region in general, but for my house, given the weather and geographical information that's available, and maybe given a few sensors that report data specifically for my house. &lt;/p&gt;
&lt;p&gt;To enable these applications, you have to create a platform that makes it possible to curate data, and develop applications that run on top of it. What you see in the service we just demonstrated is a first example of that. &lt;/p&gt;
&lt;p&gt;&lt;b&gt;JU:&lt;/b&gt; OK, thanks guys. &lt;/p&gt;&lt;img src="http://channel9.msdn.com/489746/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Cluster-computing-for-the-classroom/</comments><link>http://channel9.msdn.com/posts/JonUdell/Cluster-computing-for-the-classroom/</link><pubDate>Thu, 27 Mar 2008 11:35:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hpclabs/hpc.wma</guid><evnet:views>234</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489746/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;br /&gt;
&lt;p&gt;
Kyril Faenov and Rich Ciapala discuss a new HPC++ Labs project that enables students to run computation-intensive experiments involving large amounts of financial data. 
&lt;/p&gt;</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hpclabs/hpc.mp3" expression="full" duration="1590" fileSize="12675840" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hpclabs/hpc.wma" expression="full" duration="1590" fileSize="12837391" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hpclabs/hpc.wma" length="12837391" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Cluster-computing-for-the-classroom/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489746/Trackback.aspx</trackback:ping><category>Education</category><category>finance</category><category>HPC</category></item><item><title>A demonstration of cluster computing for the classroom</title><description>&lt;img src="http://channel9.msdn.com/Link/23579fcd-d22b-4fd0-b8a0-c467c3aea2e6/" border="0" /&gt;&lt;br /&gt;
&lt;p&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In this screencast, Rich Ciapala demonstrates Microsoft HPC++ CompFin Lab, which integrates Microsoft HPC Server, a central market data database, and Microsoft productivity products to provide university courses with an online service to publish, execute and manage computational finance models. &lt;/p&gt;&lt;img src="http://channel9.msdn.com/489745/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/A-demonstration-of-cluster-computing-for-the-classroom/</comments><link>http://channel9.msdn.com/posts/JonUdell/A-demonstration-of-cluster-computing-for-the-classroom/</link><pubDate>Thu, 27 Mar 2008 11:33:00 GMT</pubDate><guid isPermaLink="false">http://channel9.msdn.com/posts/JonUdell/A-demonstration-of-cluster-computing-for-the-classroom/</guid><evnet:views>68</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489745/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;p&gt;In this screencast, Rich Ciapala demonstrates Microsoft HPC++ CompFin Lab, which integrates Microsoft HPC Server, a central market data database, and Microsoft productivity products to provide university courses with an online service to publish, execute and manage computational finance models. &lt;/p&gt;</evnet:previewtext><media:thumbnail url="http://channel9.msdn.com/Link/97e29eaf-144e-4fcb-a946-9b5a538a2933/" height="240" width="320" /><media:thumbnail url="http://channel9.msdn.com/Link/23579fcd-d22b-4fd0-b8a0-c467c3aea2e6/" height="64" width="85" /><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hpclabs/hpc.wmv" expression="full" duration="1128" fileSize="13945423" type="video/x-ms-wmv" medium="video" /><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/hpclabs/hpc.wmv" expression="full" duration="1128" fileSize="13945423" type="video/x-ms-wmv" medium="video" /></media:group><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/A-demonstration-of-cluster-computing-for-the-classroom/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489745/Trackback.aspx</trackback:ping><category>Education</category><category>finance</category><category>HPC</category></item><item><title>Understanding CardSpace</title><description>&lt;h2&gt;In this podcast, Jon Udell chats with Vittorio Bertocci, author of &lt;em&gt;Understanding Windows CardSpace&lt;/em&gt;. The discussion traces the evolution of the identity metasystem, explores the rationale for CardSpace, and considers the unsolved problem of public online identity for individuals. &lt;/h2&gt;
&lt;table&gt;
    
        &lt;tr&gt;
            &lt;td&gt;&lt;img alt="" src="http://www.maseghepensu.it/VittoribBlogImage.jpg" /&gt; &lt;strong&gt;Vittorio Bertocci&lt;/strong&gt; is a senior technical evangelist for Microsoft Corporation. He works with Fortune 100 and major G100 enterprises worldwide, helping them to stay ahead of the curve and take advantage of the latest technologies. He is the primary author of &lt;a href="http://worldcat.org/oclc/172980362"&gt;Understanding Windows CardSpace: An introduction to the concepts and challenges of digital identities&lt;/a&gt;. &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;hr /&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;a href="http://www.amazon.com/Understanding-Windows-CardSpace-Introduction-Independent/dp/0321496841/"&gt;&lt;img alt="" src="http://ecx.images-amazon.com/images/I/51VbaAUs1FL._BO2,204,203,200_PIsitb-dp-500-arrow,45,-64_OU01_AA240_SH20_.jpg" /&gt;&lt;/a&gt; &lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;hr /&gt;
            &lt;p&gt;&lt;strong&gt;Links&lt;/strong&gt;&lt;/p&gt;
            &lt;p&gt;&lt;a href="http://blogs.msdn.com/vbertocci/"&gt;Vibro.NET: Vittorio Bertocci's blog&lt;/a&gt;&lt;/p&gt;
            &lt;p&gt;&lt;a href="http://worldcat.org/oclc/172980362"&gt;Understanding Windows CardSpace&lt;/a&gt;&lt;/p&gt;
            &lt;/td&gt;
        &lt;/tr&gt;
    
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; What I particularly liked about this book is the lengthy introduction that sets the context, not just for CardSpace but for previous iterations -- what problems did they solve, what problems did they not solve, and why does that lead us to the architecture we have now. &lt;/p&gt;
&lt;p&gt;For example, you discuss SSL client certificates. I remember thinking, in 1996 or so, when that capability was present in both Netscape and IE, here we go. No more passwords. Obviously that didn't happen. But why not? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; The SSL client strategy, from a cryptographic perspective, is perfectly sound. But it's a paradigmatic example of how technology alone cannot solve a problem that involves human interaction. &lt;/p&gt;
&lt;p&gt;The certificate is a construct that's made for computer scientists. It says that the subject is the rightful owner of a certain public key, which doesn't really resonate with my mother or my sister. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; But it didn't have to be presented that way. It could have been presented as, here is the managed card -- in modern terminology -- that you will use when you go to the Staples website. &lt;/p&gt;
&lt;p&gt;So maybe it was just too early. Or maybe the nature of that certificate didn't lend itself to the embedding of assertions in an expressive and flexible way. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; Yes. Certificates cannot be managed cards for two reasons. One is practical and could have been easily changed. The metaphor could have been friendlier, as you say. But the other thing is that a certificate is a primary token, your credentials rather than your identity. It is the mechanism for proving that you are the person entitled to that specific key. If the certificate is given to me instead of you, it's the same. There is nothing in it that says it's you. &lt;/p&gt;
&lt;p&gt;Your identity is instead something that is about yourself. When you use a managed card, you are leveraging a relationship that you have with somebody -- your airline, your government. It's true the certificate could be the enabling mechanism for expressing this relationship. But suppose I am a customer of Alitalia, and I have a card in my wallet that, when I show it to the right people, enables me to enjoy certain advantages that are part of my identity as a customer. But my relationship with this airline, the fact that I'm entitled to a certain right, can change. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Yes. So if the right is hardcoded into the certificate, that's fairly static. As opposed to the more dynamic nature of the identity metasystem, in which attributes are exchanged on the fly. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; Exactly. The attributes that make sense in a specific context -- like if you do or don't have a certain privilege should come down dynamically. Embedding them in the certificate is dangerous. I have this conversation often with governments. They tend to think of transporting online what they already have offline. So if you have a passport, it's basically like a cached token. It's something that says yes, you can travel, yes, you are Italian. But online it's really better to give this information on the fly, for a number of reasons. &lt;/p&gt;
&lt;p&gt;One reason is that you can encrypt the information directly to the relying party. When they gave my my passport, they didn't know that I would go to Iceland, or to the United States. They just gave me a blanket permission to travel. But online, I can present my passport in context that says I want to go to the US, and then the token that says yes, this person wants to go to the US, can be encrypted directly for the US embassy. Whereas a blanket permission, cached for use by everybody, would have to be accessible to everybody, which is dangerous. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Right. You also do an analysis, in this chapter, of Kerberos, and how it has desirable properties but doesn't scale for the Internet. Can you explain that? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; Kerberos itself is really the basis for many of the interactions that we use. So this idea of having an entity that knows about you, and can make assertions about you, is there in Kerberos. The problem is practical. Kerberos is one specific technology. As such, it's something you can't impose on everybody. It's a system, but if we want to talk to everybody, we need a metasystem. We need to abstract the capabilities of Kerberos in a way that does not force every participant in a transaction to speak with Kerberos itself. &lt;/p&gt;
&lt;p&gt;Also, Kerberos has a very authoritative view of the world. It is made for domains where one entity has complete control of everything and knows the keys of everybody. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; The omniscient key distribution center. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; Exactly. And the KDC knows not only about the subject, but also about the relying party. It has all the keys. In our world, that's not the case. When we say user-centric federation, we actually mean that it's the user whose choices instantly create a federation between the identity provider and the relying party. This is possible only if everybody has their own keys. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; And also if the claims that can be expressed are represented by URIs and are independent of any actor in the ecosystem. So if an identity provider and a relying part agree to synchronize on the use of some claim, and someone can provide that claim, conforming to that schema, then you can dynamically bring together a transaction. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; Absolutely. This is probably the main point. It's so important that in the metasystem we even take into account the case where we may not be able to pull that together. So we have the concept of claim transformers. If an airline needs a specific claim that cannot be produced by a known identity provider, but is available in another form, then we have mechanisms for bridging. But the general idea is exactly what you said. We should reach an agreement, at least for specific domains, about common claims. &lt;/p&gt;
&lt;p&gt;This is actually pretty close to the idea of the semantic web. Although in my opintion, claims makes it more actionable. The semantic web tries to do everything, but with claims we are in a very specific area. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; So, in the media nowadays, you tend to hear the terms OpenID and CardSpace used almost interchangeably. In one a sense in which that's not inappropriate. There's a single-sign-on aspect where the two overlap, and in fact complement one another. But it would be helpful to spell out the deeper differences. This idea of sets of claims, and claim transformation, is one of the things that distinguishes the metasystem from what's happening, at least so far, in OpenID, at least as far as I understand it. The use cases for OpenID are mainly sign-on, and now with version 2.0 there's a move toward attribute exchange. Can you explain how the metasystem differs from what OpenID does now, or is likely to do in the near future? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; OK. Now I'm not an OpenID expert, so I hope any naiveties of mine will be forgiven. From what I know, every interaction happens by means of browser redirection. I find this extremely useful, because OpenID is actually a kind of omnidirectional identifier, which is something that sooner or later we have to deal with. Whereas cards are metaphors that help me to do things that are unidirectional. Every time I use a card, it's for a transaction specifically with one relying party. &lt;/p&gt;
&lt;p&gt;The same happens with OpenID, but you have the perception that there's a URI which describes you. This opens the way to future developments which, in my view, we desperately need. What we see happening with Facebook is just a signal that the industry needs to do for omnidirectional identifiers what we are now doing for unidirectional identifiers. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Can you define those terms? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; The idea is that your identity, or identity in general, can have different audiences. An omnidirectional identifier is something you use for being recognized by everybody. So if you go to the Verisign website, using HTTPS, their certificate declares their public identity. &lt;/p&gt;
&lt;p&gt;Then you have unidirectional identities. So if I land on a website that, for business purposes, asks my age, then I obtain a token specifically for that website. We call this unidirectional. The flow goes straight to that website and nobody else. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; And this will map to attribute exchange in OpenID. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; Yes, they're very close. The point is that when you use a card today, or OpenID, you're in a unidirectional context. You're transmitting attributes to one specific relying party. &lt;/p&gt;
&lt;p&gt;But in the case of OpenID, I have my account, vibro.openid.com, and it's a URI, it's my identifier, and it's omnidirectional in the sense that everybody knows it. While in my cards, there's nothing that I tell to everybody. So I think OpenID is a good starting point for thinking about an ecology of omnidirectional identity. How do I handle identity that I want projected everywhere, not just to a specific relying party? &lt;/p&gt;
&lt;p&gt;So for example, Facebook Beacon. In my opinion that's a symptom of our need to think about omnidirectional identity. &lt;/p&gt;
&lt;p&gt;Also, the concept of an identity provider -- in both CardSpace and OpenID -- is for giving you attributes about yourself. I go on a website, I want to buy wine, I am the one who is asking the identity provider to certify me. While in the world of social networks, the requester of an identity may be somebody other than me. If somebody is looking at my profile, it's not me. But the request is still for identify information about me. This is an area that needs thought. As an industry we did an excellent job with unidirectional identity, and the ecosystem for both CardSpace and OpenID is vital. But we haven't yet found the laws for omnidirectional identity. When we do, things like Facebook Beacon won't happen. We need to extend the conversation to include omnidirectional identifiers for users. A website has a public identity. But at this moment, a user's public identity is an imagined phenomenon. You search for yourself and find traces of your identity on the web, or maybe the identity of somebody who has your same name. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Or someone who said something about you. Made a claim about you, in effect. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; Exactly. Also, a Gartner analyst recently wrote on his blog that he believes in the near future we'll need to certify the authenticity not only of poeple, but also of things like digital content. I believe that the ecology of identity needs to grow to encompass all of these things. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; I've been making this exact point recently. I see the blogosphere moving toward what we have now, at the high end, in scholarly and professional publishing. There, the papers that people publish have digital object identifiers which are being managed over the long haul, so that citations can be reliably managed. And so that claims can be made: this is not just a paper published by me, it was also peer-reviewed by these three other people. You start to build up a fabric of claims where the subject is the digital object, not necessarily the person. &lt;/p&gt;
&lt;p&gt;Was this where you were going with omnidirectional identity, that I'm broadcasting these kinds of claims. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; Yes. With OpenID you have an omnidirectional identifier, or at least a handle you can use to gain these identifiers. We can do it also with cards, but we don't push it as a metaphor. Nor is OpenID pushing it as a metaphor, it's just a side effect. But I believe it will be useful. &lt;/p&gt;
&lt;p&gt;Anyway, that was a long digression. Now I can get back to your question about how OpenID relates to CardSpace, and how they can work together. OpenID is very handy because it lives in the cloud, and it's easy to access. It doesn't intrinsically require passwords, which is fantastic. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Yes. I have a completely passwordless OpenID account at myopenid.com now, and it's wonderful. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; It's beautiful. If I have both passwords and cards... &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; ...there's still a weak link. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; Sure. If somebody calls me and says, can you please give me your username and password, and I give it, well, then, having the card didn't help me much. With cards only we eliminate one of the key weaknesses, not of OpenID itself, but of any browser-based interaction. &lt;/p&gt;
&lt;p&gt;That said, the fact that you never leave the browser is a limitation. In many situations, like for a blog, it's perfectly OK. But people are not very good at interpreting the clues and understanding if they are on the right page. It's very easy to get redirected to the wrong place. We can put in safety mechanisms, but if the website is the complete master of what goes on in this universe, there will be attack vectors that you cannot avoid. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Sure. This is the principle of consistent user experience, which is one of the seven laws. Point taken. You can't enforce that without a branded, consistent chunk of UI. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; But even if every OpenID provider were to decide that the UI for authenticating is exactly the same, if it's all within the domain of of HTML and JavaScript, then whoever initiates the experience can make you believe whatever they want, because they control your only window on reality. &lt;/p&gt;
&lt;p&gt;When you use an identity selector -- not necessarily CardSpace -- your identity interaction happens outside the browser. The browser only asks the selector for a token. &lt;/p&gt;
&lt;p&gt;Furthermore, an identity selector can secure things at the message level. The token you obtain can contain claims, but can also contain keys that you can use for securing messages, using WS-Security. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; This is one of the key distinctions. The protocols for OpenID are very light, and that's attractive. It's easy to get things done, it's quick, there aren't stacks of WS-* specs. That's clearly a reason why it's gaining traction. The identity selector piece is separate from the protocol complexity behind the glass, and we can talk about those things separately. One could imagine the very lightweight protocols of OpenID grafted onto identity selectors -- well, we have that now, I can use CardSpace as a front end to OpenID -- but the protocols being spoken are still very simple. &lt;/p&gt;
&lt;p&gt;On the other hand, your chapter about WS-Trust, WS-Metadata Exchange, WS-Federation, that's the kind of thing that makes people want to lie down and take a nap. &lt;/p&gt;
&lt;p&gt;[Laughter] &lt;/p&gt;
&lt;p&gt;So what about that? How do you delineate the value of the heavier protocols, and how do you compensate for the difficulty of making effective use of them? Where's the sweet spot? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; In terms of the difficulty of making use, I would disagree. Every single time you use a card, behind the scenes you have all the standard negotiation with WS-Trust and WS-Security, and yet you are blissfully ignorant... &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; ...as a user. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; Yes. From the user's point of view it is absolutely clear. &lt;/p&gt;
&lt;p&gt;Now if you think of the complexity of Kerboros, or even TCP/IP itself, with its backoff algorithm when it has to retransmit packets, those things are pretty damn complex, but you don't care. They sink inside the platform. And in this area too, we are sinking into the platform. I'm sure you can remember a time when you had to install TCP/IP, or write applications for a specific monitor. &lt;/p&gt;
&lt;p&gt;So, WS-Trust may be complex. In my opinion, not so, but then, my license plate is WS-STAR. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Really? &lt;/p&gt;
&lt;p&gt;[Laughter] &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; Yes. So I'm biased. But in general, the idea is that those protocols are more complex because they're trying to address a broader range of scenarios. So for example, there is no assumption of HTTP. Everything happens at the message level. So things can work on any present or future transport protocol. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Although in practice... &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; In practice, today, it's HTTP, and in fact we are optimized for HTTP. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; So, I'm a complete agnostic. I see scenarios where REST makes sense, and scenarios where WS-* makes sense. The latter, to me, always comes down to cases where you have declarative policy. It's not just a conversation between a couple of endpoints. There's a set of transactions embedded in a policy fabric, and by being able to flow through intermediaries, which can make claims transformations, which can assert policies, which can require that certain kinds of credentials are used in certain contexts, which can audit and monitor and do all those kinds of enterprisey things -- it's that class of scenario for which this more advanced functionality is designed. &lt;/p&gt;
&lt;p&gt;I think the problem is that it's easy to say, look, we have all this stuff on the web, and the web just works, therefore this is the right and only and best way to do it. Whereas if you talk to people who are involved in, say, the secure exchange of medical information, and there are multiple stakeholders asserting claims and policies about how that information is going to flow, then you do get to this place where you need stuff that's just harder. It is irreducibly harder to meet those requirements. And I could be wrong, but I don't think is saying that OpenID aims to occupy that ground. &lt;/p&gt;
&lt;p&gt;Then it becomes a question of where you get the support that enables you to do those things. Microsoft is putting together a strong story around the framework, the tools, WCF, so if you want to live in that ecosystem and can operate homogeneously, then it's great. But things never are homogenous, so that gets to the issue of interop. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; All the scenarios you mention are within the scope of WS-*. But also, now, we want to be able to do more complex things directly from the web. Things like accessing your bank account, or using your financial information to apply for a mortgage, or accessing your medical records. Those are all things that require enterprise-level guarantees, and areas where WS-* can help. &lt;/p&gt;
&lt;p&gt;Part of my job is flying around the world, talking with governments and other big players interested in this new generation of technology. I can tell you that they are very protective of their data, and they need to provide very strong guarantees to their citizens, their patients, their customers. OpenID is an extremely specialized animal. It's great specifically for the web. It's a child of our times. People are tired of remembering many different credentials, and who can blame them. OpenID is a great way of overcoming that problem. &lt;/p&gt;
&lt;p&gt;Then there are scenarios where you need to be able to model existing business relationships. With WS-Policy and WS-Metadata Exchange, their power is the ability to describe a situation that already exists, so that you can leverage online what you already have in place in the offline world. So if I'm a citizen and that fact is expressed in terms of a managed card, then I can use my privileges online automatically. I don't have to renegotiate everything with every relying party online. &lt;/p&gt;
&lt;p&gt;With WS-* you can express these things, and since it's a meta-protocol you have a decoupling layer that enables you to describe your business situation without committing to a specific encryption or authentication technology. And here we come to interop. This is one of the most heartfelt topics in this area, and there is a constant effort to keep the stuff real. If you check Mike Jones' weblog, self-issued.info, he talks a lot about this effort. He's involved in organized, for every identity-related conference, parties in which everybody brings his own technology and we build the Cartesian product of everything talking to everything else. They publish their results to a wiki, and I can tell you it's impressive. That table, which started pretty much red, is working toward green at a steady pace. And every time they hold a new event, new players come to the table. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; So the part that's easily visible to folks now is CardSpace to OpenID. Anyone can set that up, use it, and see what it's like. The part that's not visible to people, but that you see in your travels, visiting governments and businesses, are these more advanced scenarios. At what point will this become more visible? Because until it does, it all feels kind of abstract and remote, doesn't it? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; Absolutely. So, it's really hard to answer. In the last two years, we engaged with every big name you can think of. Everybody's extremely interested, because they can see the disruptive potential. But it's hard to say. What I can tell you, and it's a matter of faith, so you can choose to believe me or not, is that a lot of people are really serious about CardSpace, and are building prototypes and pilots that are internally up and running. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Fair enough. So, we haven't said a lot about the book specifically, but having written one myself, I know the incredible level of effort and commitment that it takes. Your title is &lt;em&gt;Understanding CardSpace&lt;/em&gt;, and the book lives up to its title. After I read it, I did have a better understanding of CardSpace. So, nicely done. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VB:&lt;/strong&gt; Thanks a lot! &lt;/p&gt;&lt;img src="http://channel9.msdn.com/489744/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Understanding-CardSpace/</comments><link>http://channel9.msdn.com/posts/JonUdell/Understanding-CardSpace/</link><pubDate>Thu, 20 Mar 2008 01:00:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/understanding-cardspace/understanding-cardspace.wma</guid><evnet:views>281</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489744/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>In this podcast, Jon Udell chats with Vittorio Bertocci, author of &lt;em&gt;Understanding Windows CardSpace&lt;/em&gt;. The discussion traces the evolution of the identity metasystem, explores the rationale for CardSpace, and considers the unsolved problem of public online identity for individuals.</evnet:previewtext><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/understanding-cardspace/understanding-cardspace.mp3" expression="full" duration="2817" fileSize="22537728" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/understanding-cardspace/understanding-cardspace.wma" expression="full" duration="2817" fileSize="22805431" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/understanding-cardspace/understanding-cardspace.wma" length="22805431" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Understanding-CardSpace/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489744/Trackback.aspx</trackback:ping><category>CardSpace</category><category>Identity</category></item><item><title>Robotics: A new approach</title><description>&lt;img src="http://channel9.msdn.com/Link/b9f65fa5-f6b5-44aa-a7aa-fba2e209f55f/" border="0" /&gt;&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;br /&gt;
In this podcast, Jon Udell invites Tandy Trower and Henrik Nielsen to explain why robotics is taking off, and how their new approach to the technology will generalize to a broad range of scenarios.&lt;/em&gt;&lt;/p&gt;
&lt;div class="transcript"&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; So you were just in Japan. What did you see and do? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TT:&lt;/strong&gt; We were at IREX, the international robotics exhibition in Tokyo. All forms of robots were there, heavily dominated by industrial robots. That was the big-ticket item. But we were in a smaller section that focused on this new market, service robots, which are moving into new areas. Industrial robots have done the dangerous, dull, and dirty jobs. Now there's a new market coming, where robots move outside the factories and into the homes. &lt;/p&gt;
&lt;p&gt;It's a dramatic change. Industrial robots are very expensive, they require special operators, they perform repetitive functions, and they're dangerous for humans to interact with. But that market is starting to flatten out. So a lot of the vendors in that area, including one of our best supporting partners, Kuko, one of the top industrial robotic arm manufacturers in the world, is looking for new markets, and very anxious to engage with us in this new service, or personal, robotics market. &lt;/p&gt;
&lt;p&gt;Bill Gates reflected this in his &lt;a href="http://www.sciam.com/article.cfm?chanID=sa006&amp;amp;colID=1&amp;amp;articleID=9312A198-E7F2-99DF-31DA639D6C4BA567"&gt;January article in Scientific American&lt;/a&gt;, where he likened the personal and service robotics world to the PC world in the 1970s. The personal computer market, in its infancy, looked kind of weird. You had the Commodore PET, which had a strange little keyboard and saved programs to cassette, you had the Apple II. The transition we see in this robotics market now is very similar to what we saw coming out of that era, and even the industrial vendors are starting to look to this new market as a place to go. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; In this case, there's also a particular demographic driver: aging populations create the need for these personal assistants in the home. And in Japan, in particular, there's a special interest in companionable robots. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TT:&lt;/strong&gt; Yes. In Japan and in many Asian countries, there's much more interest in the social aspect of robots. It's partly cultural, they grew up with AstroBoy and the idea that robots were friendly companions. So you're right, one of the biggest motivating factors is this aging of the population. I face this myself. My father-in-law is 84, he lives on his own, he needs help from his family to be able to live independently. It certainly would be helpful if we had more technology that would allow us to stay in touch with him, remind him to take his medications, connect him better with his health care providers, these are all things that robots could perform. &lt;/p&gt;
&lt;p&gt;It's also the case that in the Asian countries, because of family and cultural traditions, it's more important to take care of your elders. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; So if the analogy is to the early PC era, then you're providing what is, in a sense, DOS. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TT:&lt;/strong&gt; Exactly. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HN:&lt;/strong&gt; I actually think robotics can grow far beyond where the PC started out. Because the PC, until very recently, had a fairly uniform form factor. You could rely on a screen and a keyboard and a mouse, and that dictated what the user interface can be. As soon as you start having what I call more context-aware applications, things that know where you are, what you are doing, what the surroundings are doing -- and not just the local environment -- this causes the computation and the applications to be completely different. They are inherently part of the environment. They have to become much more aware, and the ways you interact with them have to become more aware. You might want to use speech for some things, or touch, or just you being there in person so it can track you using heat, or motion. &lt;/p&gt;
&lt;p&gt;Robotics hardware has come a long way in terms of price, functionality, and flexibility. But service robots have yet to reach a level of usefulness that defines how they might be able to take off. There are some obvious entertainment opportunities, and remote presence opportunities, but beyond that we're only in the beginning phase of figuring out what these applications look like. &lt;/p&gt;
&lt;p&gt;And we actually think it applies not only to robotics but also to how you might start thinking about interaction with information systems in general. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TT:&lt;/strong&gt; And that's been one of the challenges. How do you create applications like this, where you have a lot of things going on? PCs have had it easy. They just sit there, they take the keyboard input, the mouse input, but when they have to go and sense things in our environment, and actually operate in our environment, it takes a much more complex model. How do you deal with all these different sensory inputs that are coming in at the same time? How do you deal with controlling the activations of many different things at the same time? This, we believe, is not just a model for robotics, but is a model for software of the future. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Absolutely, because there no longer is the illusion of god-like control of the machine. In the early PC network, pre-network, you really did make the rules and you really did have that control. But in the network era, and now as the network extends into the physical world, you're an actor on a stage with a number of other actors running around with their own agendas. It becomes a negotation, a game of interaction. So yes, it absolutely mandates a different model, and that model extends equally to loosely-coupled services that communicate by sending messages over the network. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TT:&lt;/strong&gt; Yes, a model that deals with the inherent complexity of concurrency, and the coordination or orchestration of what's going on. This was the whole reason for choosing the CCR and DSS pieces for robotics. This was actually an advanced programming model designed not for robotics per se, but as a general purpose programming model. We put it into the robotics SDK as a way to test this out, but now we're seeing that people are lifting the hood on the engine inside this SDK and finding other uses for it. We have people who are using it to build trading systems, who are doing large data-set scientific modeling, the folks at MySpace are using it to manage their server farms. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; So let's review, for people who may not have followed the story. The CCR, which is the Concurrency and Coordination Runtime, and DSS, which stands for Decentralized Software Services, are projects that were in the works, and had a relationship to one another, prior to their incorporation into the robotics kit. Is that true? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TT:&lt;/strong&gt; Yes, that's right. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HN:&lt;/strong&gt; Yes, absolutely. DSS is built on top of CCR. By way of background, the challenge was to answer the question: What is the programming and application model when it's no longer true that you have a single process running on a single cpu on a single machine? We think that is already no longer true. When you look down you see many cores under you that operate concurrently... &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; And many nodes on the network... &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HN:&lt;/strong&gt; That's right, and when you look up you see many nodes on the network, and you want to have your application function in that environment. In fact you need to define what an application is. If you are in fact building a composition of services you need to deal with the concurrency, but also about messages flowing around in the system. It becomes much more autonomous computing. And this is why it fits nicely with robotics. It's about sensing, get a huge amount of input from the environment in a very asynchronous and loosely-coupled way. &lt;/p&gt;
&lt;p&gt;Everything becomes an autonomous unit. And each can be participating in many different applications at the same time, without even knowing it... &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Or not participating, because some of them went AWOL, but that's OK because you have the redundancy to handle that. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HN:&lt;/strong&gt; Exactly. The web has been trying to push toward this model for a long time, and now the appearance of many-core CPUs has started to push toward it. So the whole idea of an application, which hasn't changed for 30 years, now has to change. And that's the question we tried to answer when we started out with CCR and DSS. They work nicely together. One provides a programming model, the other provides an application model, that together fits nicely around messaging, as you said. We think it leads you down a path of building very robust, scalable, and flexible applications. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; So in this context how do you define an application? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HN:&lt;/strong&gt; It is a composition of a set of loosely-coupled services that function individually. Kind of like in a mashup environment. You have a variety of inputs, a different set of outputs that you want to be able to affect, it is the orchestration of messages going in and out. It's the collection -- it is effectively, when you look at it, a graph of services that you start thinking of as your application. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; And a ruleset. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HN&lt;/strong&gt;: And a ruleset, yes, exactly. So it's about having a set of services hooked together, and ruleset for how to orchestrate messages over that set. And it's about partial failure, and redundancy, because you don't have control over all of these services. Some run locally, some run across the network, some run in the cloud. You want to be able to leverage them all, and hook new things in. &lt;/p&gt;
&lt;p&gt;Here's a very practical problem from a robotics point of view. You might have had your robot in the home for a couple of years. It has learned where you go, it knows your calendar, it knows a bunch of things about you. Now you might get another robot. Rather than wait a couple of years for it to get up to speed on what you think matters to it, you might want to be able to hook into the same application context. It's a web of information that you want the new guy to be able to hook into. It's all about the connectedness of applications. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TT:&lt;/strong&gt; And of course this is the way that living systems operate, whether we're talking about the cellular structure of our bodies, or our neural systems, or even full ecosystems. It's all based on the fact that the nodes themselves have a certain importance, but it's the connectivity through the nodes -- the way they communicate with one another -- that provides the inherent power. Our own neural system is a massive network. The individual nodes provide insignificant data, yet they pass these messages along, and through the orchestration of these connections we get the ability to see, or to hear, or to be able to function in our world. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Biomimicry, that's the ticket. Nature's already done all this R and D, why don't we piggyback on what it's already figured out. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TT:&lt;/strong&gt; Exactly. When I first looked at applying the technology that Henrik was working on, that was one of the areas I looked at. Now it turns out that biologically inspired techniques are still in a crude stage, so my second attempt was to apply this to robotics because it's a more practical technology that may eventually evolve toward more biologically inspired technologies. &lt;/p&gt;
&lt;p&gt;Again, this whole model was never designed to be exclusively for robotics. It was designed to be a programming model for the future that would enable a new generation of applications. We've been trying to create them, today, as if they were all on a single neuron. What this technology says is that with the trends that are coming -- Intel and AMD both now shipping 4-core systems, 8-cores coming next year, how are we going to manage all this power? And the Internet shows us that we've already moved past the idea of running a single application that runs on a single core on a single machine, that's just obsolete. How do you reduce the overall complexity when your application runs in five different places at the same time? Is it even a solvable problem? Well it turns out that the CCR and DSS have solved that problem, they do provide that programming model. And that's not just me saying that, we have customers who are embracing them because they are helping solve these complex problems. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; One of the challenges, as we see in the web services space, is that when the application becomes a set of actors on the stage, with a lot of other actors, how do I know that I'm meeting my requirements, how do I test? I think these are all extensions of things we know how to do, but still, it changes the game. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HN:&lt;/strong&gt; Oh, it changes dramatically. We hold the basic assumption that bad things happen, and things fail for unknown reasons. In the case of robotics it works beautifully, because the robot falls off the cliff, and it's gone. But you can't just stop. It would be smart to say, well, don't do what that thing did. Try to avoid falling off the cliff. That's where this magic term loose coupling comes in. It's often seen as a good thing to do, an important architectural principle, but in fact how to do it turns out to be difficult. How can you write an application that can fail partially without the rest of it going down? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; How do you evaluate the performance of an application? We're used to a model where the testable performance is discrete. It did or didn't do this function. But in this world, it tends toward the probabilistic. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HN:&lt;/strong&gt; Oh, absolutely. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; It's not whether it vacuumed the room or not, but how well did it do that? And over a series of trials, how did that average out? It's fuzzier. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HN:&lt;/strong&gt; Yes. Of course people already know that on the web, when they use search engines. They know they'll get a decent response, but an exact snapshot is just not possible. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; It won't be authoritative or complete. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HN:&lt;/strong&gt; It's a snapshot of a moment in time. I think a lot of the applications we deal with will have to think about that, and be organized around that. And that boils down to, well, I have information, how do I orchestrate it, how do I put weights on the different pieces of information? And how do I spread it around so I can build something that doesn't freeze? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TT:&lt;/strong&gt; Related to that, what do you do when one of your program components does freeze up, or crashes. In this world, it's fine. If you lose one of the services in the set, because its state is separated out, you can drop the service or restart it or replace it... &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; And reattach the state to another instance of the service. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TT:&lt;/strong&gt; Exactly. What do you do when you find out that code has failed? Do you reboot the system? Do you remove the whole application? Or do you just surgically go in there and remove or fix one piece? I mean, we lose cells all the time, and they're replaced, and yet we don't have to be rebooted every time a new cell comes in. It just fits into the network, finds its place, replaces the old one, and we continue on. Software needs that kind of resiliency. You need to do that kind of surgical maintenance. &lt;/p&gt;
&lt;p&gt;Back to robotics, the classical model was this. You read your sensors, you decide what to do about that sensory input, and then you effect your actuators. The problem with that was twofold. First it's very brittle. You get one wrong instruction, you bring the whole application down. Second, while you're processing your sensory input or actuator outputs, you're not reading your sensors. So at the time you should be noticing that you're running into the wall, you're telling the wheels to move forward. &lt;/p&gt;
&lt;p&gt;The fact that we talk about this as orchestration is a very apt metaphor. What happens in an orchestra, what does a conductor do? He has a lot of people playing at the same time, his task is to make sure that it all blends together and sounds beautiful. This is the key, this is the programmer's challenge in the future. How are they going to keep an application flowing that way? It needs a simple model, but one that is scalable from the lowest level of abstraction to the highest level. That's what we believe we have here in the CSS/DSS companionship. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; I was going to ask how you begin to instill this way of doing things into a new generation of programmers, but I think I got the answer in a recent &lt;a href="http://itc.conversationsnetwork.org/shows/detail3467.html"&gt;conversation with Matt MacLaurin&lt;/a&gt;, in the Creative Systems Group. He's developing a thing called Boku, which is both a game and a game development system, but all on these same principles. A kid puts an object into the world, then declares what are the goals or the reactions that it can have. Then you start to get emergent things happening, and you are learning to operate in a world which is much like the one you're describing. You're not controlling this world. You're injecting things into it that participate and interact, and you need to shape those interactions. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HN:&lt;/strong&gt; Robotics offers a lot of excitement in terms of education. It ties together a lot of technologies, in terms of science, math, applied technologies like vision and audio, and also computer science. So it's a powerful vehicle for getting attention from students. &lt;/p&gt;
&lt;p&gt;So we had this problem, people said, well, if you want to use it for computer science, then computer science 101 has to be a for loop, or a function call... &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Sorting. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HN:&lt;/strong&gt; Sorting, exactly. And we said, well, we think it might be interesting to expose this model of distribution and concurrency directly. We don't think the students will freeze up, they are already aware of the asynchronicity from IM and email. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; This is what Matt is doing, actually. It's beautiful. So, to make this concrete, let's come back to home automation. In the case of HealthVault, currently, any of the home health devices that connect to it will be satellites of the PC. But you're imagining a model where the home is more of a network of...well, in a sense, the entire home is a complex robot. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HN:&lt;/strong&gt; My view is that the P in the PC will go away. Because it's about computers in the network, and the connectedness of them, and the fact that you want them to be orchestrated, but you don't really go and sit in front of any of them. When the robot's around you might do some stuff with it, then you go down into your basement that might do something else, but you want the information to be continuous. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; It's not like your cellphone, a thing that's permanently attached to you. When it's in your environment, you can interact with it, but it doesn't have to be there, and you can interact with lots of other things. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HN:&lt;/strong&gt; Yes. Of course the cellphone has had clearly subordinate role to the PC, you dock it and synch it, but these devices are becoming full-fledged network devices. So again you have to have an application and programming model that allows you to build applications that can float around these devices as they come and go. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; And the support software is light enough for these devices? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HN:&lt;/strong&gt; You mean in terms of CCR and DSS? Yes. We run today on Windows CE, and I think we can say for the next release we will run -- we are already running now -- on the micro framework, which doesn't have any Windows underneath, it's really running a very lightweight managed environment straight on top of the hardware. We can run on very small things. Light switches. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Really? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HN:&lt;/strong&gt; Yes. We run a limited version of what we have, but it's the same bits, fundamentally. We had a researcher in MSR implement a very lightweight version of our protocol, on a small device, and have it show up in our environment, without having to do anything else. It could be a light switch, a thermostat, a security alarm, any number of devices that don't do much computation but provide sensory input. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; So if somebody wants to get their feet wet with this, and do a little project that gives them a taste of what it's like, what would you recommend? I mean, they can get the kit, but what's a good example to try? &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HN:&lt;/strong&gt; There are a lot of people who'd be excited about going to the store and buying a robot, and that's great. But assuming you couldn't, what you would start with is the simulation environment. It allows you, without having touched a robot at all, to play around with a set of robots that we provide you with simulation models for. You can very easily, in 5 minutes, download the SDK and then get going with a robot that is only in the visual environment. However it's more than that, it is a fully physics-aware environment. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Sensors, actuators... &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HN:&lt;/strong&gt; Yes, so when you bump into other things they will move, and if you made them very heavy, your robot will flip or crash. So you have a very easy way to get started. &lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JU:&lt;/strong&gt; Thanks Tandy. And thanks, Henrik. &lt;/p&gt;
&lt;/div&gt;
&lt;/blockquote&gt;&lt;img src="http://channel9.msdn.com/489743/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Microsoft-Robotics-A-new-approach/</comments><link>http://channel9.msdn.com/posts/JonUdell/Microsoft-Robotics-A-new-approach/</link><pubDate>Thu, 13 Mar 2008 15:26:00 GMT</pubDate><guid isPermaLink="false">http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/robotics/robotics.wma</guid><evnet:views>290</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/489743/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>Perspectives is a new series of interviews in which Jon Udell discusses a&amp;nbsp;variety of&amp;nbsp;topics&amp;nbsp;with passionate Microsoft innovators in&amp;nbsp;areas as diverse as robotics, digital identity, e-science, and social software. The format will be an audio podcast and a blog, with partial transcription to make it accessible to those who don't listen to podcasts. The home for perspectives will be at &lt;a href="http://perspectives.on10.net"&gt;perspectives.on10.net.&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
In the first installment, Jon&amp;nbsp;invites Tandy Trower and Henrik Nielsen to explain why robotics is taking off, and how their new approach to the technology will generalize to a broad range of scenarios.&amp;nbsp;&lt;br /&gt;
&lt;br /&gt;
&lt;blockquote&gt;&lt;strong&gt;JU: &lt;/strong&gt;So you were just in Japan. What did you see and do? &lt;br /&gt;
&lt;strong&gt;TT: &lt;/strong&gt;We were at IREX, the international robotics exhibition in Tokyo. All forms of robots were there, heavily dominated by industrial robots. That was the big-ticket item. But we were in a smaller section that focused on this new market, service robots, which are moving into new areas. Industrial robots have done the dangerous, dull, and dirty jobs. Now there's a new market coming, where robots move outside the factories and into the homes...&lt;/blockquote&gt;</evnet:previewtext><media:thumbnail url="http://channel9.msdn.com/Link/f90afcf3-258a-4b59-bf3c-b6ba29c08612/" height="240" width="320" /><media:thumbnail url="http://channel9.msdn.com/Link/b9f65fa5-f6b5-44aa-a7aa-fba2e209f55f/" height="64" width="85" /><media:group><media:content url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/robotics/robotics.mp3" expression="full" duration="2194" fileSize="17555328" type="audio/mp3" medium="audio" /><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/robotics/robotics.wma" expression="full" duration="2194" fileSize="17762025" type="audio/x-ms-wma" medium="audio" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/on10/perspectives/robotics/robotics.wma" length="17762025" type="audio/x-ms-wma" /><dc:creator>JonUdell</dc:creator><slash:comments>0</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Microsoft-Robotics-A-new-approach/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/489743/Trackback.aspx</trackback:ping><category>Robotics</category></item><item><title>Embedding a Popfly widget in Facebook</title><description>&lt;p&gt;In this ten-minute screencast (&lt;a href="http://channel9.msdn.com/Media/popfly/popfly-silverlight.html&gt;Silverlight&lt;/a&gt;, &lt;a href="http://channel9.msdn.com/Media/popfly/popfly-flash.html&gt;Flash&lt;/a&gt;), &lt;a href="http://blogs.msdn.com/adam_nathan/"&gt;Adam Nathan&lt;/a&gt; shows you how to personalize your Facebook page with a Popfly-based photo viewer.
&lt;/p&gt;&lt;img src="http://channel9.msdn.com/258070/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Embedding-a-Popfly-widget-in-Facebook/</comments><link>http://channel9.msdn.com/posts/JonUdell/Embedding-a-Popfly-widget-in-Facebook/</link><pubDate>Mon, 08 Oct 2007 21:00:54 GMT</pubDate><guid isPermaLink="false">http://channel9.msdn.com/posts/JonUdell/Embedding-a-Popfly-widget-in-Facebook/</guid><evnet:views>5317</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/258070/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>&lt;p&gt;In this ten-minute screencast (&lt;a href="/Media/popfly/popfly-silverlight.html"&gt;Silverlight&lt;/a&gt;, &lt;a href="/Media/popfly/popfly-flash.html"&gt;Flash&lt;/a&gt;), &lt;a href="http://blogs.msdn.com/adam_nathan/"&gt;Adam Nathan&lt;/a&gt; shows you how to personalize your Facebook page with a Popfly-based photo viewer.
&lt;/p&gt;</evnet:previewtext><media:thumbnail url="http://channel9.msdn.com/Link/a8498a94-4944-491a-bb21-a4070db79946/" height="240" width="320" /><media:thumbnail url="http://channel9.msdn.com/Link/527ae99d-5aa8-428d-aff0-d0c2e6a3481d/" height="240" width="320" /><media:thumbnail url="http://channel9.msdn.com/Link/8ed6cc0e-1124-418e-b57e-d6c224fcadb1/" height="240" width="320" /><media:thumbnail url="http://channel9.msdn.com/Link/8faeea05-1db9-4e9d-9ae1-60899f75c6e0/" height="64" width="85" /><media:thumbnail url="http://channel9.msdn.com/Link/850117f0-f3b7-4a16-b97c-827740020e2c/" height="64" width="85" /><media:thumbnail url="http://channel9.msdn.com/Link/3265b011-8178-4db4-ab58-8db5695f9e9a/" height="64" width="85" /><media:content url="http://mschnlnine.vo.llnwd.net/d1/ch9/0/7/0/8/5/2/346658.jpg" expression="full" type="image/jpeg" medium="image" /><dc:creator>JonUdell</dc:creator><slash:comments>2</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Embedding-a-Popfly-widget-in-Facebook/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/258070/Trackback.aspx</trackback:ping><category>Popfly</category></item><item><title>Andreas Ulbrich demonstrates the Microsoft Visual Programming Language</title><description>&lt;p&gt;In an &lt;a href="http://channel9.msdn.com/Showpost.aspx?postid=328816shape="&gt;earlier screencast&lt;/a&gt;, Henrik Nielsen illustrates how the Microsoft Robotics Studio, building on top of the &lt;a href="http://channel9.msdn.com/ShowPost.aspx?PostID=143582shape="&gt;CCR&lt;/a&gt; (Concurrency and Coordination Runtime) and &lt;a href="http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1386497&amp;amp;SiteID=1"&gt;DSS&lt;/a&gt; (Decentralized Software Services) technologies, exposes a RESTful service-oriented architecture. &lt;/p&gt;
&lt;p&gt;In this companion screencast, Andreas Ulbrich demonstrates VPL (Visual Programming Language), a diagram-oriented dataflow language. Although it was created for the Robotics Studio, it is -- as you'll see here -- a very general way to visualize, orchestrate, and debug message-driven services that run work in parallel. &lt;/p&gt;&lt;img src="http://channel9.msdn.com/256826/WebViewBug.aspx?EVT=0" height="1" width="1" alt="" /&gt;</description><comments>http://channel9.msdn.com/posts/JonUdell/Andreas-Ulbrich-demonstrates-the-Microsoft-Visual-Programming-Language/</comments><link>http://channel9.msdn.com/posts/JonUdell/Andreas-Ulbrich-demonstrates-the-Microsoft-Visual-Programming-Language/</link><pubDate>Mon, 06 Aug 2007 15:30:00 GMT</pubDate><guid isPermaLink="false">http://channel9.msdn.com/posts/JonUdell/Andreas-Ulbrich-demonstrates-the-Microsoft-Visual-Programming-Language/</guid><evnet:views>15334</evnet:views><evnet:viewtrackingurl>http://channel9.msdn.com/256826/WebViewBug.aspx?EVT=0</evnet:viewtrackingurl><evnet:previewtext>In an earlier screencast, Henrik Nielsen illustrates how the Microsoft Robotics Studio, building on top of the CCR (Concurrency and Coordination Runtime) and DSS (Decentralized Software Services) technologies, exposes a RESTful service-oriented architecture. In this companion screencast, Andreas Ulbrich demonstrates VPL (Visual Programming Language), a diagram-oriented dataflow language. Although it was created for the Robotics Studio, it is -- as you'll see here -- a very general way to visualize, orchestrate, and debug message-driven services that run work in parallel.</evnet:previewtext><media:thumbnail url="http://channel9.msdn.com/Link/7666ba6a-b162-4fdd-8947-4c0b5daa4550/" height="240" width="320" /><media:thumbnail url="http://channel9.msdn.com/Link/d27f6fc8-f55a-4cc9-90ed-7846a98fa3fb/" height="240" width="320" /><media:thumbnail url="http://channel9.msdn.com/Link/1cb2b584-3a41-4c4a-902f-ba8351763494/" height="240" width="320" /><media:thumbnail url="http://channel9.msdn.com/Link/772cb2d7-3e19-48a5-a519-6c67150cb9ea/" height="64" width="85" /><media:thumbnail url="http://channel9.msdn.com/Link/60917deb-3cfa-49f1-bcf9-ec6e636ed103/" height="64" width="85" /><media:thumbnail url="http://channel9.msdn.com/Link/399f01f8-081c-4487-8caa-19f3b68eddbc/" height="64" width="85" /><media:group><media:content isDefault="true" url="http://mschnlnine.vo.llnwd.net/d1/ch9/6/2/8/6/5/2/vpl.wmv" expression="full" fileSize="19788215" type="video/x-ms-wmv" medium="video" /><media:content url="http://mschnlnine.vo.llnwd.net/d1/ch9/6/2/8/6/5/2/332336.jpg" expression="full" type="image/jpeg" medium="image" /></media:group><enclosure url="http://mschnlnine.vo.llnwd.net/d1/ch9/6/2/8/6/5/2/vpl.wmv" length="19788215" type="video/x-ms-wmv" /><dc:creator>JonUdell</dc:creator><slash:comments>9</slash:comments><wfw:commentRss>http://channel9.msdn.com/posts/JonUdell/Andreas-Ulbrich-demonstrates-the-Microsoft-Visual-Programming-Language/RSS/</wfw:commentRss><trackback:ping>http://channel9.msdn.com/256826/Trackback.aspx</trackback:ping><category>Robotics</category><category>VPL</category></item></channel></rss>