Download this episode
Jonathan Fay is principal developer of the WorldWide Telescope. In this interview he explains how the project has yielded not only a breakthrough software product, but also a reference model for the acquisition, transformation, and visualization of astronomical data. You'll learn not only how the WorldWide Telescope works, but also why it exists: To fulfill the education mission discussed in a related interview with Curtis Wong and Roy Gould.
JF: As long as I've been doing computers, going back to the early 1980s on TRS-80, graphics, and visualization of data and the earth and space, were interests of mine. I'd gotten a department-store telescope one year for Christmas, and loved looking at stuff through the light-polluted LA skies.
JU: So you were in the same boat as Curtis Wong?
JF: Yeah, you could really only see planets and the moon in any detail. But I was passionate about computers and astronomy. Every time computers got more powerful, I'd look into visualizing the Mandelbrot set and the stars as a litmust test.
In 2001 I was development manager for HomeAdvisor, and we were assimilating a research project called TerraServer. Tom Barclay, a researcher who was working with Jim Gray, said, "Hey, USGS has this DEM -- digital elevation model -- data that they'd like me to load into TerraServer. I wonder if you have ideas about what we could do with it."
I'd been very much into 3D visualization. I have this program called LightWave, which goes back a long time but is now used for things like Serenity and BattleStar Galactica, so I started taking TerraServer images and USGS data and creating hills with texture-mapped images.
Then Tom Barclay told me how NASA was using satellite weather data, watching over many days, and getting rid of the clouds so you could see the surface of the earth. They called it the Blue Marble project. I found and downloaded that data, and also some global digital elevation data, and starting creating a hierarchical 3D view of the earth so you could zoom in and browse. Then I worked to bring that into TerraServer, because we had resolution down to a couple of meters.
But this was just a side project, and there wasn't interest in developing it, so I decided to look into visualizing other astronomy data.
JU: This was around the time in 2002 when Jim Gray and Alex Szalay published their paper entitled the World-Wide Telescope?
JF: Right. Jim talked about TerraServer "pointing up" as the next thing. He was already getting himself embedded with astronomers. I didn't see much of that. Tom was babysitting TerraServer while Jim went off into the astronomy end of things, and I was still doing geo, so we weren't collaborating.
After having made some demos, a lot of people thought it was cool, but that was all. So I kept that on the back burner, and moved into some other groups. At the same time I was building my observatory. In Seattle, you take pictures when you can. If you can't push a button and have your observatory open up and take images when you get clear skies, by the time you set up you'll be clouded in. I wanted to automate the whole process, including image processing. That introduced me to the whole pipeline of data collection, processing, and subsequent research.
Although I'm an amateur, I had to drill into the world of data and image processing that professional astronomers had to deal with. I was using the same resources.
JU: I'd like to hear more about that. A lot of us are aware that those data and image resources exist, but it's really unclear how to make use of them.
JF: You know, there is a lot available, but most amateur astronomers had no idea it existed, it was very hard to get to, and even the scientists had a hard time getting access to it. Essentially it was locked up in silos.
JU: If you know where to find the gzipped tarball, and then if you can unzip it and figure out how to use it, without any documentation about metadata and formats...
JF: Right. So, I'd heard about this very large database of stellar objects, the US Naval Observatory's USNOB. It was 100 gigabytes. At that time, there were barely consumer hard drives that could hold that. Forget transferring it over the network, it's 120 CDs, the only way to transfer the data was to ship hard drives around the country.
JU: Yeah, I remember Jim talking about doing that.
JF: I'm just an amateur, but I feel like I need the data, so I found out that this guy named Dave Monet, in Flagstaff, would let me ship him a hard drive and he'd put the data into a Linux-formatted partition and send it back.
On the one hand, I was shocked to see how easy it was for me to get access to the same data that the professional astronomers were using. And by easy, I mean it was possible.
But on the other hand, I realized you had to be really committed, and know exactly what you're doing.
JU: Right. There were no services wrapped around the data to make it useable by anybody other than a 100% focused and dedicated researcher.
JF: As I started doing more with imaging, I had the concept that I should flip my earth inside out and render the sky. One of my friends, Doug George, created a full-sky survey, in gorgeous color, but the software that went around with it would take ten or 15 seconds every time you moved your view. Nothing resembling interactive or realtime.
And here I had this application that dealt with the same quality and quantity of data instantaneously. So I say hey, I can build an engine to go with your data.
And I told him about a company, called Starry Night Pro, that was using some 3D effects but not actual image data from the sky. He wound up licensing his data to them, but the result they got was closed and self-contained.
JU: What kind of imagery was in it?
JF: What we'd now consider a low-to-medium resolution full-sky survey of the northern and southern hemisphere.
JU: When you say low-to-medium resolution, what could you see if you zoomed in on a galaxy?
JF: If you zoom into M51 in WorldWide Telescope, using the Hubble imagery, it'll be about 4000 pixels tall. And in their survey, it's about 4 pixels tall. You can barely make out that it's a spiral galaxy.
We have the entire sky at one arc-second per pixel, and for objects like M51, thousands of pixels tall. And of course every time you go twice the resolution, it's four times the data.
They wanted to fit everything on a CD-ROM. For us, we're talking about terabytes, it's not something you distribute. I thought you should install a small application, and the data comes over the network.
JU: And that's how WorldWide Telescope does it?
JF: Right. Everything except the thumbnails comes over the Net. We use the thumbnails to get the wordwheel functionality with search.
JU: The data file's about 3 megabytes?
JF: There's about 12 megabytes of thumbnails, but yes, the catalog is about 3 megabytes.
So, I had this vision for a product, but the economics were wrong to do it as commercial software in the astronomy market. Plus, they'd want to do something aimed entirely at high-end amateurs, not at professional astronomers, or at the general public who are the outreach targets for professional astronomers.
And then Curtis and I got together. I envied his position in research, being able to explore new things that hadn't been done before.
It turned out that Curtis had been exploring how to create an educational environment with rich tools for exploring space, and he'd been collaborating with Jim Gray on TerraServer, and now he was looking for the technology to make it possible.
Here I had this technology, and was looking for somebody who was enthusiastic about having a purpose for it. So it was the peanut butter and chocolate moment. Curtis passionate from the education side, me from the technology side, happening to be in the right company at the right time.
So I made a demo using with the Sloan Digital Sky data, and Jim went crazy over it. This was the visualization aspect he'd been looking for. It was the front end that makes the data consumable.
JU: Tell us about the WWT's back end, and how it relates to what Jim's team built.
JF: To get the data out of the silos, Jim was involved in the National Virtual Observatory and the International Virtual Observatory Alliance. If you know how to talk these VO standards, you can exchange data, and you can do queries against other people's data.
JU: So on the one hand, these standards enable you to combine data sets that you fully assimilate. But on the other hand, they enable federated query.
JF: Right. A lot of the astronomers were dealing with data extracted from catalogs. You took image data, and then you got the numerical analysis out of it, and stuck that in the database. The transfer of images wasn't really their domain for this round, they wanted to do the stuff you could put into SQL Server.
So while TerraServer put earth image data into SQL Server, the sky image data was lagging behind. But you could query from a source on the Internet, and then join it to some other data coming from another source. Sometimes it required the data to marshall from one machine to another for efficiency, but essentially it meant you didn't have to translate everything into your database.
JU: But I assume that federated query isn't happening in WorldWide Telescope. We're not waiting for requests to go across the network, you've combined the datasets for your purposes.
JF: There are common sets of data that you'll need all the time. It's a relatively small amount, and we download that to your client. The thumbnails, the catalog.
JU: And what's in the catalog?
JF: The Messier objects, the NGC objects, the list of solar system objects,
JU: And coordinates for them...
JF: Yes, and magnitudes, and classifications. For the 10,000 brightest stars. Probably 30,000 objects in all. We'll make that live on your machine so you can zip around in the sky, look at stuff, and say, hey, what's that?
JU: Which is what every planetarium program does, right?
JF: Yes, but that's generally where they stop. They go a bit beyond, by having a bigger download. We do it in 20 megabytes, they may have 250, or a gigabyte, but that's all you'll ever get.
In our case, when you start up and your client contacts the WorldWide Telescope, we give you metadata saying what sources are available: the Hubble collection, the Spitzer collection. The metadata tells you where to go get the imagery. Some of it we'll host in Microsoft's data center, for scale reasons, and to ensure that it's available. But this data can be anywhere: Space Telescope, JPL...
JU: So I'm looking at the list. Which of these many sources are you hosting?
JF: We're hosting a lot of the data we launched with. Partly because we don't yet have a space act agreement with NASA. Even though we've collaborated with a lot of people who are NASA-funded, they're not allowed to acknowledge that collaboration or put anything into a legal document until we have that agreement done. While there are some people we could have just pointed to as data sources, it'd be in violation of internal NASA policies. So we're hosting more than was strictly necessary for the initial release.
But the concept is that you can plug in other sources that we're not even aware of. You just load metadata references into your client, by going to a website for that community or organization, and then you have access to terabytes of their data.
JU: The standards talk about how to represent objects and their metadata. Do they also talk about how you query a source, since they're all going to be huge? What's the query protocol?
JF: At WorldWide Telescope we understand what's called VOTables. There are standard ways to create queries, and standard ways to get results.
There are two ways that can happen. One is that our servers can do the queries, consolidate and cache the results, and we regurgitate the data as needed to our clients. So we do a VO SIA (simple image access) query to Hubble occasionally. When they have new images, we download these 500 megabyte or gigabyte images, which would be a very big download for a client, and we chop them up and create a tiled multi-resolution pyramid that we store on our server. The raw consumer wouldn't have have been able to use that data, but by putting our value-add into the pipeline -- Hubble took the image, Space Telescope processed it and put it up on a web service, we do another step of processing to make it visualization-friendly -- now lots of people can see a thumbnail, click on it, it zoom in, and the instant that they click and zoom they're already seeing the image. And as they zoom in further, they see all the gorgeous detail, but they don't have to download all the data.
JU: Is this engine related to the Deep Zoom technology?
JF: We predate Deep Zoom. It has some similarities, but the difference is that Deep Zoom and Seadragon are 2D technologies that use the graphics engine for doing tiled multi-resolution images. We actually have to align all our images in 3D space because from the earth, space looks like a big sphere at almost infinite distance, but there is a curvature to it.
Imagine taking a round room, and trying to put a bunch of bathroom tiles on it, and grout it. The tiles seem to come together and have parallel lines for a while, but eventually it stops working well. Maybe you can take one line around the equator, but as you go up you have fewer tiles, and weird-shaped tiles, and nothing lines up.
That's the problem we have. We're looking at spherical data, so we had to come up with a new spherical transform that preserves the poles. In previous projects, like Virtual Earth or TerraServer or Google Earth, the poles weren't important, because nobody lives there and nobody needs map directions for driving around there.
As far as the earth is concerned, you can cut off everything above and below a certain latitude and nobody would care. But you can't treat the sky like that. And you can't treat the moon or other planets that way either.
So we had to come up with something called TOAST: tesselated octahedral adaptive subdivision transform. It creates a 360-degree wraparound view that's either a planet surface or the infinite sphere of the sky, and lets you represent it using a 3D graphics accelerator, very rapidly and efficiently. So we can have an image pyramid the way Deep Zoom does, and TerraServer before it, but we don't have to give up the poles.
That was something that didn't exist. There was Mercator projection, which is how you're used to seeing the earth mapped onto a flat piece of paper. It's hard, you have to do weird math to make it work at all. Then there's equirectangular projection. But there was nothing that could deal with storing an image in a spherical projection.
JU: So there are multiple full-sky surveys that you can switch between. So for example you can be looking at the Milky Way in the standard view, then switch over to infrared view and see it as an incandescent band.
Is it the VO standards that enable you to weave those views together in a coherent way?
JF: No, that's where TOAST comes in. What astronomers did before is that, because there was no way to visualize the full sky data, they would store all their images as a bunch of individual...
...OK, you have a sphere in the sky. You put a camera on it and take a picture. What shows up on the film is what's called a tangential projection.
Imagine taking a beach ball with all the stars plotted on it, and putting a light in the middle, and putting the beach ball up against a wall touching at one point. The stars will shine out and hit that wall. All of these beams are projecting from the middle, to where they lie on the sphere's surface, to where they hit on the wall. It's a way of taking something round and making it flat.
As long as you're looking at a very small part of the sky, there isn't very much distortion. But when you start looking at a large part of the sky the distortion becomes huge.
What astronomers did was put these tangential projections into databases, and they even knew how to mosaic them to make bigger chunks. But when it came to anything larger, it broke down. If they made really big mosaics, they had to use projections that couldn't represent the poles, and everything would get more distorted the farther it got from the equator.
So now we have services like NASA SkyView. NASA has over 50 full-sky surveys sitting on servers, and while they participate in the Virtual Observatory, the images themselves are using a private well-dcumented standard. So we gave them code for TOAST.
It used to be that when people made a request for a wide area of the sky, they would return multiple images joined into a mosaic. But now we said, we could ask for just a single tile, at a given level of resolution -- one tile that was the whole sky, or one tile that was a tiny piece of the sky -- but everything was laid out in a very specific grid for our projection.
While their software couldn't do it very quickly, it allowed us to go through and get all the tiles from their servers, for all these different studies, and put them up on our high-capacity servers.
So there's an automated path to get from a bunch of individual pictures of the sky to this full-sky mosaic that can be seen seamlessly.
JU: So where's the TOAST transform being applied?
JF: Right now it's being applied, for that data, on their servers.
JU: So you gave them the algorithm, and they're running it for you?
JF: That's correct. And eventually they'll be able to host the data when they have the capacity, so you could point a WorldWide Telescope client there. And even today theoretically you could do it.
JU: They keep the sources as they acquired them, but make the output of this transform available to queries?
JF: They generate the transform on the fly for each query. If they added a cache and then kept it warm, it would be acceptable for interactive use.
JU: When you look at the source list in WorldWide Telescope, those are the surveys you're talking about?
JF: Yeah, ROSAT and WMAP and things like that. Those are the full-sky surveys. So for the first time ever, we've assembled a view of the sky where you can look at everything from radio wave all the way to gamma. All the way from the longest-wavelength lowest-energy part of electromagnetic spectrum to super-high-energetic particles.
JU: It's completely amazing, and it's wild to be able to cross-fade between them and compare the differences.
JF: We put together a standard for how you can visualize a spherical data set, we've given people the ability to create this data, and we've provided a client that knows how to accept this astronomy data -- both the spherical data and the original tangential images.
So when you have a study from Hubble, they can use the original tangential images the way they came off the camera, and in WorldWide Telescope we figure out the math and do the 3D transforms so that when we align that to the TOAST background from another full-sky survey, all the stars are exactly where they should be and everything lines up.
And because we have the universal coordinate system -- right ascension and declination -- we can put things in the right place in the sky. When you cross-fade you may be looking at apples and oranges, but you're looking at them on the same tree.
JU: Is this going to be a public standard? Can other clients use your services, or other services that support it?
JF: We've offered the algorithms and code to other organizations, like JPL, and we've even told Google that if they're interested in reworking their all-sky surveys to work with this format, we'd help. But they've got such critical mass around their current projections that they don't think they can take that on anytime soon.
JU: There's been some pushback, as you know, about WorldWide Telescope being a Windows-only product. But the project is much broader.
JF: Yes. And part of it is that all the data we support in WorldWide Telescope, and the WTML language we use...
...when people ask me how WorldWide Telescope differs from an astronomy program like Starry Night, I say that it's like a browser, like Internet Explorer or Safari or Firefox, but it's a browser of data in formats that are astronomy-friendly, like VOTables and WTML.
JU: Now WTML isn't the XML syntax you see when you save a tour and look into the file?
JF: Right. That, we're not even documenting. That's the tour XML format. But if you look in your user folder, or add objects to your collections and look in your documents folder, you'll see WTML there. It describes objects, hierarchies, network links, images.
A tour in WTML is metadata that says, this is the tour, what categories it's related to, what objects it visits.
We can also have things that say, there's an article in Sky and Telescope about M51, and it has that object's location in the sky. When you join the Sky and Telescope community in WWT, and you're browsing around and you find M51, you can look down in the context search and see the article, and open it up.
JU: That'll depend on which communities I belong to?
JF: Yes. We always show you the WorldWide Telescope stuff. Then when you log in we show you the union of that and stuff for the community you're currently looking at.
JU: OK, very helpful. Now let's go back to your discussion of projection, and see how it relates to my experience last night. I found the Milky Way, and I wanted to pan west, but it seemed like things wanted to spin around.
JF: There's two ways to look at the sky. First, looking at the full spherical view as if the earth didn't exist. You're earth-centered, but the horizon isn't blocking your view. North is up, south is down, and unless you specifically spin your view, when you move, north will always stay north.
JU: That's the view without the horizon.
JF: Right. With the horizon, the zenith always stays looking up, and as you move around, if you're looking at the zenith, it will always stay at the top. It can never go below the midpoint of the screen.
On a space station where there is no up or down, you'd think you could design anything and people could just float around in 3D space, there'd be no preferred direction. But the reality is that humans get extremely confused. Your brain has a natural desire to have an up and down and left and right, and when you invert those, you don't process things.
So if you were in the View From Here mode, the zenith always stays up. If you're in the other mode, looking at full universe, and you went to the north pole and tried to move beyond, you'd only be able to spin. You would not be able to pull the north pole beyond the middle of your screen, because that's your viewpoint. So then south would start becoming up, and left would be right, and you'd be spinning in the hamster ball.
JU: So if I want to look at the Milky Way, and then swing left to locate the Pleiades..
JF: To simulate looking at the sky, go to View and select the location where you are, and say View From Here. Then it will show you a horizon, and north/south/east/west, and north is straight up. Then it will simulate your eyes. If you're standing up and you look at the horizon, then you look up and up, what happens? When you're looking up, your head is tilted all the way back, touching your back, and you can't tilt any more. To see any further back you'd have to fall over.
So then what do you do? You rotate yourself and look south. That's how your head works, and that's how a telescope with an alt-az [altitude/azimuth] mount works.
We're trying to put on constraints so people don't get lost and upside down and backwards. But unfortunately it's hard to explain what happens when you get to the poles.
JU: Do you provide an unconstrained view?
JF: We do not. We cannot simulate an unconstrained view. The only thing we do allow is that, once you're viewing something, you can rotate the camera's view by hitting Control and then dragging left and right.
JF: It's possible that's what was happening to you. We have a Reset Camera if you want to go back to neutral.
The reason for this feature is that when you're making a tour, you might need to orient your view. M51 goes up and down but your screen goes left and right. If you want to zoom in and frame it, you need to rotate your camera like you would a real camera.
JU: OK, that may have been the confusion.
JF: When you get in that mode, we try to make north-south-east-west make sense based on that, but it will do strange things at the poles. We still try to keep north, minus your rotation, up. But that mode is a little strange. We give that feature so people making tours can frame things better, but it's not something we try to document or recommend that people use for normal browsing.
So, if you care about your position on earth, use View From Here. If you want to ignore your position on earth, use the default mode. Then we don't care where you are, we're going to show you the whole sky, and date and location are ignored, it's just the sky, immutable and unmoving. Well, the planets move around on it.
JU: We'll never get to the bottom of all this, but I think you've given us a good sense of what I was really looking for, which was: What's actually been accomplished here? In terms of taking this raw astronomy data and correlating it in a way that's not just consumable in terms of quantities of data transmitted over the network, but in terms of making sense of objects and relationships.
JF: The vision of getting everybody access to all this astronomy data required systematic changes at every single level. We built on some things that Jim pioneered with NVO, and worked from there, but it was very systematic. How people process the data. The client to access the data. The protocols over the wire. Educating people, providing the context for it.
We put a lot of things together, but we also created a systematic model for how to do everything end to end, top to bottom, left to right. Now there may be other people who use the pieces that we've created, and then change them to use different data sources, different visualizations. Say someone creates a Mac client, or an iPhone client, that's possible. Or a mobile phone version of it, or a web-based version. Over time we or others can replace various components, but as a reference model for solving all the problems in order to get the data into people's homes and into their eyeballs -- you had to solve for all of those problems, otherwise people are still blocked from being able to really explore.
JU: Will this end-to-end pipeline be documented?
JF: Things like TOAST, and WTML, and our communities interface will be documented. There will be documentation, tools, and code coming out over the summer to help people understand more. As for some of the protocols, we'll need to do some work to make sure they're ready for us to recommend as standards.
JU: Excellent. Well, thanks Jonathan!
JF: OK, thank you..
Available formats for this video:
Actual format may change based on video formats available and browser capability.