WEBVTT

00:00:00.000 --> 00:00:10.470
[MUSIC].

00:00:10.470 --> 00:00:11.730
>> Hi, I'm Ronny Chatterjee.

00:00:11.730 --> 00:00:15.150
I'm a Senior Product Manager and
Azure data team at Microsoft,

00:00:15.150 --> 00:00:16.710
and I'm pleased to join with

00:00:16.710 --> 00:00:18.450
Steven Drucker from
Microsoft research.

00:00:18.450 --> 00:00:20.400
>> Yes. Hi, so I'm Steven Drucker.

00:00:20.400 --> 00:00:23.550
I'm a researcher and I
manage our visualization

00:00:23.550 --> 00:00:26.700
and interactive data analysis
team in Microsoft research.

00:00:26.700 --> 00:00:28.620
>> So today we are
going to show some of

00:00:28.620 --> 00:00:31.560
the data visualizations
experiences using SandDance.

00:00:31.560 --> 00:00:33.345
So before we get started Steve,

00:00:33.345 --> 00:00:36.690
what was the motivation
for building [inaudible].

00:00:36.690 --> 00:00:40.685
>> So we were looking at
visualizing large numbers of things

00:00:40.685 --> 00:00:42.620
and one of the things that drives me

00:00:42.620 --> 00:00:45.095
crazy is that it's very
easy to get an aggregate,

00:00:45.095 --> 00:00:46.850
a summary, but the summary

00:00:46.850 --> 00:00:49.010
leaves out a lot of
the individual things.

00:00:49.010 --> 00:00:50.930
So we were looking at
how to actually look at

00:00:50.930 --> 00:00:52.790
the individual data but

00:00:52.790 --> 00:00:54.815
still have it organized
in an aggregate sense.

00:00:54.815 --> 00:00:55.445
>> Awesome.

00:00:55.445 --> 00:00:56.600
>> But that's what really led.

00:00:56.600 --> 00:00:58.070
>> Okay. So let's get started and

00:00:58.070 --> 00:00:59.825
let's see what SandDance provides.

00:00:59.825 --> 00:01:01.040
>> Okay. Well first of all,

00:01:01.040 --> 00:01:04.805
you can go to GitHub and look
at SandDance right there.

00:01:04.805 --> 00:01:08.420
You can launch an
interactive experience here,

00:01:08.420 --> 00:01:10.520
and I'm just going to cut out go it,

00:01:10.520 --> 00:01:14.750
and we'll just start looking
at data experience in this.

00:01:14.750 --> 00:01:16.000
So right now,

00:01:16.000 --> 00:01:17.870
the first classic dataset that we

00:01:17.870 --> 00:01:20.060
looked at a lot is
the Titanic dataset.

00:01:20.060 --> 00:01:21.920
Here we've got about 2,200

00:01:21.920 --> 00:01:24.335
people on the Titanic
passengers and crew,

00:01:24.335 --> 00:01:25.730
and you can actually see it.

00:01:25.730 --> 00:01:28.010
This is like what you'd see
from a SQL query where you

00:01:28.010 --> 00:01:30.890
just look at all the numbers
would not organized at all.

00:01:30.890 --> 00:01:33.185
That doesn't give you a
whole lot of information.

00:01:33.185 --> 00:01:34.730
But one of the things
you might want to

00:01:34.730 --> 00:01:36.320
do is you might want to say okay,

00:01:36.320 --> 00:01:39.630
who survived the
Titanic and who didn't?

00:01:39.630 --> 00:01:39.885
>> Wow.

00:01:39.885 --> 00:01:43.095
>> You can see all those things
move into position and get those.

00:01:43.095 --> 00:01:45.330
>> So approximately like 1,500?

00:01:45.330 --> 00:01:48.270
>> Yeah, about 1,500 people died,

00:01:48.270 --> 00:01:50.660
and so it's about double the
number of people that survived.

00:01:50.660 --> 00:01:52.535
So that's like a one
third and two-thirds.

00:01:52.535 --> 00:01:54.020
That's pretty striking.

00:01:54.020 --> 00:01:54.425
>> Yeah.

00:01:54.425 --> 00:01:56.690
>> So one of the
things I like to do is

00:01:56.690 --> 00:01:58.880
I like to let's actually
add a little color.

00:01:58.880 --> 00:02:02.020
So I'm going to color this based
upon who survived the Titanic.

00:02:02.020 --> 00:02:04.280
You can see kind of
natural green and red yet,

00:02:04.280 --> 00:02:07.110
but you can choose
whatever pallets you want.

00:02:07.180 --> 00:02:09.410
That's not really
showing anything else,

00:02:09.410 --> 00:02:11.330
but now I can select
just the people that

00:02:11.330 --> 00:02:13.490
survived and I can isolate them.

00:02:13.490 --> 00:02:15.410
I'm going to do this because
I want to see did they follow

00:02:15.410 --> 00:02:17.690
that women and children
first paradigm?

00:02:17.690 --> 00:02:20.150
So now we're just looking at
the people that survived,

00:02:20.150 --> 00:02:22.910
and now I'm going to change
just pivot and just say,

00:02:22.910 --> 00:02:26.095
let's look at by gender.

00:02:26.095 --> 00:02:29.460
So we can pick gender,
and you can say,

00:02:29.460 --> 00:02:32.990
wow, about the same number
women and men survived.

00:02:32.990 --> 00:02:34.340
I wouldn't have believed that?

00:02:34.340 --> 00:02:35.330
>> I didn't know that.

00:02:35.330 --> 00:02:36.980
>> Yeah, it's striking.

00:02:36.980 --> 00:02:39.410
Did they not follow that?

00:02:39.410 --> 00:02:41.990
Well actually, if we stop

00:02:41.990 --> 00:02:44.865
filtering and we actually see
all the people, you can say,

00:02:44.865 --> 00:02:46.550
yes, the absolute number of men

00:02:46.550 --> 00:02:48.350
or women survived was about the same,

00:02:48.350 --> 00:02:50.510
but there were a lot more men
on the Titanic than women.

00:02:50.510 --> 00:02:53.210
So my percentages, you
have men pretty badly.

00:02:53.210 --> 00:02:54.830
But that's not the whole story,

00:02:54.830 --> 00:02:56.180
we could also just facet this.

00:02:56.180 --> 00:02:58.280
We could actually do
four different plots

00:02:58.280 --> 00:03:00.200
based upon what cabin
class they're in.

00:03:00.200 --> 00:03:01.880
So if we just click on,

00:03:01.880 --> 00:03:05.250
"Cabin class" here, you can
see now we've got first class,

00:03:05.250 --> 00:03:06.840
second class, third class in crew.

00:03:06.840 --> 00:03:08.850
In first-class, not many women died,

00:03:08.850 --> 00:03:10.275
they did really really well.

00:03:10.275 --> 00:03:12.705
Second-class, a little worse.

00:03:12.705 --> 00:03:15.420
But third class, they
did pretty poorly.

00:03:15.420 --> 00:03:18.090
So what it says here is
to be at the Titanic,

00:03:18.090 --> 00:03:19.890
it helps to be a rich woman,

00:03:19.890 --> 00:03:22.800
survived a little bit better.
But what about children?

00:03:22.800 --> 00:03:25.170
So let's change instead
of looking at gender,

00:03:25.170 --> 00:03:27.810
let's simply change this to age.

00:03:27.810 --> 00:03:30.530
Now, we can break this out right

00:03:30.530 --> 00:03:32.795
now at the age is a
little bit spread out.

00:03:32.795 --> 00:03:34.175
Now we've got 0-10 there.

00:03:34.175 --> 00:03:36.230
You can see at the 0-10,

00:03:36.230 --> 00:03:37.475
you can see in first-class,

00:03:37.475 --> 00:03:39.380
only one child died.

00:03:39.380 --> 00:03:42.305
In second class, none
of the children died.

00:03:42.305 --> 00:03:45.290
In third class, more than half died.

00:03:45.290 --> 00:03:48.020
I was curious, we can
actually zoom into this.

00:03:48.020 --> 00:03:51.365
We can actually just zoom in
and say who is this person?

00:03:51.365 --> 00:03:54.270
That's Mrs. Helen Lorraine
Alice, and we can click on her.

00:03:54.270 --> 00:03:56.210
One of the nice things and
this is actually hooked up,

00:03:56.210 --> 00:03:58.130
you can get to the
individual data points,

00:03:58.130 --> 00:04:00.995
she look her up and
just do a being on her.

00:04:00.995 --> 00:04:05.315
It turns out there's something
called the Encyclopedia Titanica,

00:04:05.315 --> 00:04:07.145
which I didn't know there was that,

00:04:07.145 --> 00:04:09.470
but you find out that Lorraine Alson

00:04:09.470 --> 00:04:11.750
was the only child in first
and second class to died.

00:04:11.750 --> 00:04:14.420
It turns out that the
parents were separated from

00:04:14.420 --> 00:04:17.420
the child and the younger brother,

00:04:17.420 --> 00:04:19.160
and they vowed not to leave

00:04:19.160 --> 00:04:21.305
the Titanic until the
whole family was together.

00:04:21.305 --> 00:04:22.700
But it turns out that
there are still taking

00:04:22.700 --> 00:04:24.080
the younger brother off earlier.

00:04:24.080 --> 00:04:26.900
So she was the only child who
die in first and second class.

00:04:26.900 --> 00:04:29.345
But 53 of 76 children died.

00:04:29.345 --> 00:04:32.075
It really helps you humanize
that data, understand that.

00:04:32.075 --> 00:04:32.465
>> Yeah.

00:04:32.465 --> 00:04:35.060
>> So that's just kind of one
of the many stories in here.

00:04:35.060 --> 00:04:37.130
There's lots of other stories
that you can get in this.

00:04:37.130 --> 00:04:40.370
So for instance, another
thing you can find is,

00:04:40.370 --> 00:04:44.035
there's a way of looking at this
based upon how much people paid.

00:04:44.035 --> 00:04:47.940
Let me turn off the facet
there, "Face", "None".

00:04:47.940 --> 00:04:50.630
So here, if we actually change

00:04:50.630 --> 00:04:53.330
this so that we look at
the price people paid for

00:04:53.330 --> 00:04:55.775
the ticket and then we if we

00:04:55.775 --> 00:04:58.760
color this by what cabin
class they end up with,

00:04:58.760 --> 00:05:00.155
you see some interesting patterns.

00:05:00.155 --> 00:05:02.450
So first of all, you can
see it's really big.

00:05:02.450 --> 00:05:05.525
You pay a lot of money,
you get a first-class.

00:05:05.525 --> 00:05:06.440
>> Right.

00:05:06.440 --> 00:05:07.760
>> Then second-class is

00:05:07.760 --> 00:05:10.265
this orange thing and then
third class is this thing.

00:05:10.265 --> 00:05:11.660
Turns out that the
crew is not showing

00:05:11.660 --> 00:05:12.725
up because they didn't pay at all.

00:05:12.725 --> 00:05:13.070
>> Right.

00:05:13.070 --> 00:05:14.300
>> But when I saw this,

00:05:14.300 --> 00:05:17.075
I noticed this weird
anomaly right down there.

00:05:17.075 --> 00:05:18.800
Let's go into that
little corner here.

00:05:18.800 --> 00:05:20.180
If I could just steer over there.

00:05:20.180 --> 00:05:22.700
It's like this person didn't
pay much money at all.

00:05:22.700 --> 00:05:25.190
What's going on there?
Who is this person?

00:05:25.190 --> 00:05:27.620
Who is Mr. Frans Olaf Carlson?

00:05:27.620 --> 00:05:29.750
Again, we could do the same
thing that you just click on

00:05:29.750 --> 00:05:32.825
him and we can get some
information about him.

00:05:32.825 --> 00:05:34.490
If you bring him, I won't do it,

00:05:34.490 --> 00:05:35.930
I'll just tell you the
punchline right now.

00:05:35.930 --> 00:05:37.310
It turns out and he was skipper on

00:05:37.310 --> 00:05:39.140
the same line that ran the Titanic.

00:05:39.140 --> 00:05:41.405
>> Oh wow, that's very interesting.

00:05:41.405 --> 00:05:43.295
>> There was a strike and

00:05:43.295 --> 00:05:45.560
his boat wasn't running so
they couldn't be a captain.

00:05:45.560 --> 00:05:47.540
So they shipped him
off on the Titanic.

00:05:47.540 --> 00:05:48.650
He didn't have to pay
anything it's up for

00:05:48.650 --> 00:05:49.925
the taxes for his ticket.

00:05:49.925 --> 00:05:51.875
It turns out he died in the Titanic.

00:05:51.875 --> 00:05:53.990
So this guy was doubling unlucky.

00:05:53.990 --> 00:05:56.975
He couldn't be a captain
and he couldn't do a thing.

00:05:56.975 --> 00:05:57.490
>> Yeah.

00:05:57.490 --> 00:05:59.220
>> There's one more anomaly here,

00:05:59.220 --> 00:06:00.780
what's this guy here?

00:06:00.780 --> 00:06:03.155
Didn't he paid about as much money

00:06:03.155 --> 00:06:06.740
as second class
passengers, but he's blue.

00:06:06.740 --> 00:06:07.130
>> All right.

00:06:07.130 --> 00:06:08.495
>> So the first class,
so what's going on?

00:06:08.495 --> 00:06:10.310
So if we click on this person,

00:06:10.310 --> 00:06:12.560
Mr. Nourney, and again,

00:06:12.560 --> 00:06:13.730
we can look him up.

00:06:13.730 --> 00:06:16.070
Turns out that Mr.
Nourney was placed,

00:06:16.070 --> 00:06:19.910
again via the old Encyclopedia
Titanica, come back,

00:06:19.910 --> 00:06:27.300
he was placed second class and
he did not like his cabin,

00:06:27.300 --> 00:06:30.290
so we went and complained and
they upgrade him on the ship.

00:06:30.290 --> 00:06:32.360
He had just had to pay for
the upgrade charges like ship

00:06:32.360 --> 00:06:34.520
which is why it didn't show
up in that original lane.

00:06:34.520 --> 00:06:36.530
It turns out that because he was in

00:06:36.530 --> 00:06:39.680
first class or maybe not
because, but he survived.

00:06:39.680 --> 00:06:41.690
So it just shows you
that sometimes it

00:06:41.690 --> 00:06:44.360
helps to complain and maybe
you end up surviving.

00:06:44.360 --> 00:06:46.760
>> Thank you so much for showing us

00:06:46.760 --> 00:06:50.550
the lot of experiences across
with SandDance provides.

00:06:50.560 --> 00:06:54.830
Tell us more about what
are the products across

00:06:54.830 --> 00:06:57.770
Microsoft which leverages
this beautiful technology

00:06:57.770 --> 00:06:59.445
of SandDance which you build in MSR.

00:06:59.445 --> 00:07:00.630
>> So right now,

00:07:00.630 --> 00:07:02.385
it works in Azure Data Studio,

00:07:02.385 --> 00:07:05.010
Power BI, it works
as a custom visual.

00:07:05.010 --> 00:07:05.670
>> Very nice.

00:07:05.670 --> 00:07:07.610
>> You can try it
directly on the web with

00:07:07.610 --> 00:07:09.890
your own data and it doesn't
even get uploaded to the Cloud,

00:07:09.890 --> 00:07:11.330
it just runs that data locally.

00:07:11.330 --> 00:07:13.340
So you can just look at your
own data sets right now.

00:07:13.340 --> 00:07:15.230
Of course, if you want
to do more stuff,

00:07:15.230 --> 00:07:17.030
it helps upload it to the Cloud.

00:07:17.030 --> 00:07:19.820
We're looking at the general
plug-in architecture.

00:07:19.820 --> 00:07:22.910
We do a lot of work in visualizing
machine learning models.

00:07:22.910 --> 00:07:25.250
>> Yeah, and you also build
it in VS Code as well, right?

00:07:25.250 --> 00:07:26.720
>> Yes, exactly.

00:07:26.720 --> 00:07:28.070
It's incorporated into VS code.

00:07:28.070 --> 00:07:30.320
So again, this whole
now at way of doing

00:07:30.320 --> 00:07:34.205
open source extensions to existing
product thinking a pluggable,

00:07:34.205 --> 00:07:37.010
it's really exciting. To
me, it's a new world.

00:07:37.010 --> 00:07:39.875
I've been here at
Microsoft almost 25 years,

00:07:39.875 --> 00:07:43.730
and just the energy that
you're seeing now in

00:07:43.730 --> 00:07:45.800
getting stuff out there
for people to try

00:07:45.800 --> 00:07:47.975
and modify get pull
requests and add to it.

00:07:47.975 --> 00:07:49.265
>> It's just amazing.

00:07:49.265 --> 00:07:51.275
So what's next for SandDance?

00:07:51.275 --> 00:07:52.700
>> There are lots of
things. First of all,

00:07:52.700 --> 00:07:54.320
there's ton of feature requests.

00:07:54.320 --> 00:07:57.140
So there's all sorts of things.

00:07:57.140 --> 00:07:59.240
People will want to add
images to these things,

00:07:59.240 --> 00:08:00.905
different kinds of layouts.

00:08:00.905 --> 00:08:04.280
I do a lot of machine learning and
interpretability visualization.

00:08:04.280 --> 00:08:06.140
So we look at custom
ways of doing that and

00:08:06.140 --> 00:08:08.600
hooking that in project called Model

00:08:08.600 --> 00:08:10.460
Tracker that we can visualize that

00:08:10.460 --> 00:08:13.565
same individual and the aggregates.

00:08:13.565 --> 00:08:15.530
We do it in storytelling.

00:08:15.530 --> 00:08:17.450
So there's a bunch of
projects that are under

00:08:17.450 --> 00:08:19.850
just how they told a bunch of
stories about the Titanic,

00:08:19.850 --> 00:08:21.770
but the Titanic's not
necessarily stories

00:08:21.770 --> 00:08:22.990
you want to always be telling about.

00:08:22.990 --> 00:08:26.270
Telling stories about your
data or data that's relevant.

00:08:26.270 --> 00:08:28.160
It just feeds into a lot of

00:08:28.160 --> 00:08:29.840
the other visualization
projects that are

00:08:29.840 --> 00:08:31.830
going on in research
and in the company.

00:08:31.830 --> 00:08:33.440
>> Very nice. So thank you

00:08:33.440 --> 00:08:35.885
everybody for listening
in and joining us today.

00:08:35.885 --> 00:08:38.570
It was a pleasure to have
you Steve today joining us

00:08:38.570 --> 00:08:41.410
in and play around with SandDance,

00:08:41.410 --> 00:08:43.070
and going around and GitHub and

00:08:43.070 --> 00:08:45.095
explore what you can do
on top off your data.

00:08:45.095 --> 00:08:46.550
Thank you so much for
listening in today.

00:08:46.550 --> 00:08:46.910
>> Thanks.

00:08:46.910 --> 00:09:01.630
[MUSIC]

