WEBVTT

00:00:00.000 --> 00:00:10.530
[MUSIC].

00:00:10.530 --> 00:00:12.300
>> Hi, I'm Rony Chatterjee,

00:00:12.300 --> 00:00:15.180
I'm a Senior Product Manager
in the Azure Data team.

00:00:15.180 --> 00:00:17.190
I work on the SQL server product and

00:00:17.190 --> 00:00:19.605
today I'm excited to
show you what we have

00:00:19.605 --> 00:00:22.230
built as a data
visualization experience

00:00:22.230 --> 00:00:25.410
in Azure Data Studio.
Let's get started.

00:00:25.410 --> 00:00:27.690
In Azure Data Studio,

00:00:27.690 --> 00:00:31.965
you have a product which gives you

00:00:31.965 --> 00:00:33.960
a data visualization as well as an

00:00:33.960 --> 00:00:36.600
experienced to work
over data anywhere,

00:00:36.600 --> 00:00:38.970
whether it's data on-premises as well

00:00:38.970 --> 00:00:41.755
as data inside Big Data Clusters.

00:00:41.755 --> 00:00:45.470
So in this addition of Azure
Data Studio, as you can see,

00:00:45.470 --> 00:00:47.870
I have a SQL Database Edge and

00:00:47.870 --> 00:00:51.080
a SQL Server Big Data
Cluster I'm connected into.

00:00:51.080 --> 00:00:56.075
Now, in Azure Data Studio is
built on top of extensions.

00:00:56.075 --> 00:00:59.300
So one of the things with
Azure Data Studio provides is

00:00:59.300 --> 00:01:00.830
an ability to install

00:01:00.830 --> 00:01:03.830
any extension which you might
like for your data operation.

00:01:03.830 --> 00:01:06.155
So here, one of the extensions
which we are going to

00:01:06.155 --> 00:01:08.830
install today is called SandDance.

00:01:08.830 --> 00:01:12.035
SandDance is a data
visualization experience.

00:01:12.035 --> 00:01:15.050
It provides unit
visualization for the data

00:01:15.050 --> 00:01:18.365
you would like to explore
and visualize in SandDance.

00:01:18.365 --> 00:01:21.515
So here I'm installing
the SandDance extension,

00:01:21.515 --> 00:01:22.955
and as you can see,

00:01:22.955 --> 00:01:25.535
the SandDance extension
was just installed.

00:01:25.535 --> 00:01:28.775
So let's go back to
where the data is.

00:01:28.775 --> 00:01:32.720
So I have data inside the
SQL Server Big Data Cluster.

00:01:32.720 --> 00:01:35.585
The SQL Server Big Data
Cluster comes with

00:01:35.585 --> 00:01:38.330
HDFS with the Big Data Cluster

00:01:38.330 --> 00:01:40.745
where we can store
high volume of data.

00:01:40.745 --> 00:01:44.510
One such data which I have
stored in HDFS for instance

00:01:44.510 --> 00:01:48.695
of SQL Server Big Data
Cluster is a demovote.tsv.

00:01:48.695 --> 00:01:51.755
So now let's take a look at
how this data looks like.

00:01:51.755 --> 00:01:54.230
So if you do a quick
preview of the data,

00:01:54.230 --> 00:01:58.040
the datasets opens from HDFS,

00:01:58.040 --> 00:01:59.150
and as you can see,

00:01:59.150 --> 00:02:01.115
this is nothing but voting data.

00:02:01.115 --> 00:02:03.230
It has the voting data
during the Obama,

00:02:03.230 --> 00:02:04.625
Romney time frame of

00:02:04.625 --> 00:02:07.160
elections which happened
across the United States.

00:02:07.160 --> 00:02:11.540
Now, if I have to process this
particular data set and give it to

00:02:11.540 --> 00:02:15.925
my data scientists to actually
make sense from the data,

00:02:15.925 --> 00:02:18.010
he would have to actually
build some reports,

00:02:18.010 --> 00:02:19.265
and whether he uses

00:02:19.265 --> 00:02:22.330
data visualization library
structural to view it,

00:02:22.330 --> 00:02:24.845
or he uses some code

00:02:24.845 --> 00:02:27.695
to actually visualize
what the data looks like.

00:02:27.695 --> 00:02:29.509
But in Azure Data Studio,

00:02:29.509 --> 00:02:33.170
we have our right-click
option of view in SandDance.

00:02:33.170 --> 00:02:36.070
Let's take a look at how
this data looks like.

00:02:36.070 --> 00:02:39.705
Now, you can see that
SandDance is loading up,

00:02:39.705 --> 00:02:42.525
and SandDance actually
auto figured out that

00:02:42.525 --> 00:02:45.480
this data has latitude,

00:02:45.480 --> 00:02:47.765
longitude, and scatter plot would be

00:02:47.765 --> 00:02:51.140
the best chart to choose for
plotting this particular data.

00:02:51.140 --> 00:02:55.610
So we have a recommender chart
type built inside SandDance.

00:02:55.610 --> 00:02:57.395
So now this data looks good.

00:02:57.395 --> 00:02:59.300
I can see that the map
of the United States,

00:02:59.300 --> 00:03:01.040
as expected because the voting data,

00:03:01.040 --> 00:03:03.590
and have the x-axis and
the y-axis as well.

00:03:03.590 --> 00:03:07.640
Now, still I don't know if there are

00:03:07.640 --> 00:03:10.220
more information in
the data and I could

00:03:10.220 --> 00:03:13.760
obviously filter in terms of
the x-axis and the y-axis,

00:03:13.760 --> 00:03:16.760
but wouldn't it be great if I
could actually plot this in

00:03:16.760 --> 00:03:19.085
a three-dimensional space to actually

00:03:19.085 --> 00:03:21.935
see which one might be
the best classifier.

00:03:21.935 --> 00:03:26.085
So let's make this graph
a three-dimensional one.

00:03:26.085 --> 00:03:28.620
So if I click at "SandDance" and if I

00:03:28.620 --> 00:03:31.575
click the "Three-dimensional
one," now I have the z-axis.

00:03:31.575 --> 00:03:35.840
Now in the z-axis I can
choose income and see

00:03:35.840 --> 00:03:38.015
across the United States what

00:03:38.015 --> 00:03:40.715
the income demographics
of the people are.

00:03:40.715 --> 00:03:42.740
You can see that
Washington is doing good,

00:03:42.740 --> 00:03:44.855
Delaware is doing a little better,

00:03:44.855 --> 00:03:47.960
and then in New York and
Miami is great as well.

00:03:47.960 --> 00:03:49.760
But let's take a look at,

00:03:49.760 --> 00:03:51.725
in terms of the median home value,

00:03:51.725 --> 00:03:53.180
how are the prices looking?

00:03:53.180 --> 00:03:56.840
You can see that the graph
automatically adjusted itself.

00:03:56.840 --> 00:04:02.045
You could also have a various
color schemas you can use.

00:04:02.045 --> 00:04:04.580
So I could actually see, income,

00:04:04.580 --> 00:04:06.230
and you can see that I can choose

00:04:06.230 --> 00:04:09.380
different color schemas to
actually color the graph.

00:04:09.380 --> 00:04:13.325
I could also easily
change the bincount size.

00:04:13.325 --> 00:04:17.030
So you can see that as
I change the bincount,

00:04:17.030 --> 00:04:20.380
the graph dynamically
just changes completely.

00:04:20.380 --> 00:04:22.130
Now this is good.

00:04:22.130 --> 00:04:24.865
I can also visualize the data,

00:04:24.865 --> 00:04:27.850
zoom in, zoom out, but now,

00:04:27.850 --> 00:04:32.440
I would like to see if the data
set is giving me some way of

00:04:32.440 --> 00:04:34.165
where I could predict

00:04:34.165 --> 00:04:37.445
some anomalies if that
exists inside the data.

00:04:37.445 --> 00:04:40.880
So now, let's start by doing
some search on top of the data.

00:04:40.880 --> 00:04:46.455
So I'm going to search
where income is less than,

00:04:46.455 --> 00:04:52.829
say $ 40,000, and also
add another expression,

00:04:52.829 --> 00:04:59.675
and say, median home value
is greater than $ 800,000.

00:04:59.675 --> 00:05:02.205
That seems to be a good check.

00:05:02.205 --> 00:05:04.230
Let's see if we can find
some data set which

00:05:04.230 --> 00:05:06.975
actually satisfies this needs.

00:05:06.975 --> 00:05:10.170
Now, if I actually do
a search and select,

00:05:10.170 --> 00:05:12.260
you can see that
SandDance has couple of

00:05:12.260 --> 00:05:14.660
data points which it
actually picked it up.

00:05:14.660 --> 00:05:17.210
Those are shown in the graph.

00:05:17.210 --> 00:05:20.080
I could easily isolate
these data points,

00:05:20.080 --> 00:05:21.860
and now I just have

00:05:21.860 --> 00:05:24.275
a subset of these four
data points to consider.

00:05:24.275 --> 00:05:26.570
You can see that there's a person in

00:05:26.570 --> 00:05:29.825
San Miguel County who
has bought a house of

00:05:29.825 --> 00:05:35.710
$ 812,500 with an income of $ 39,070.

00:05:35.710 --> 00:05:39.830
So SandDance actually give
you unit visualization of

00:05:39.830 --> 00:05:43.985
the data you are trying to explore
and make sense from the data.

00:05:43.985 --> 00:05:46.220
You could also do other things

00:05:46.220 --> 00:05:49.440
like where you could take
a snapshot of the picture,

00:05:49.440 --> 00:05:51.150
you want to create a snapshot view,

00:05:51.150 --> 00:05:53.919
you could actually create a snapshot,

00:05:56.720 --> 00:05:59.690
and it creates a snapshot view.

00:05:59.690 --> 00:06:02.630
Now you can embed this
snapshot view in a document

00:06:02.630 --> 00:06:04.160
which you want to
probably send it over to

00:06:04.160 --> 00:06:05.945
someone else to take
a look at as well.

00:06:05.945 --> 00:06:08.330
So this is the way we are providing

00:06:08.330 --> 00:06:10.400
data visualization in the context of

00:06:10.400 --> 00:06:12.635
the data you are operating in.

00:06:12.635 --> 00:06:16.880
Now, also one of the
things we have added

00:06:16.880 --> 00:06:22.925
inside SandDance was the ability
to visualize the query results.

00:06:22.925 --> 00:06:26.060
So here I have a
database I have created,

00:06:26.060 --> 00:06:30.020
and in this particular database
I have a sensor data table.

00:06:30.020 --> 00:06:32.840
So now if I do select a 1,000

00:06:32.840 --> 00:06:35.705
from that particular
table and I could

00:06:35.705 --> 00:06:38.135
also click on this chart

00:06:38.135 --> 00:06:41.839
here and load this data in
the SandDance as visualizer,

00:06:41.839 --> 00:06:44.495
though this is just the sensor
data and it does not have

00:06:44.495 --> 00:06:47.210
as interesting as the
voting data is but still,

00:06:47.210 --> 00:06:50.360
this gives you a quick view of what

00:06:50.360 --> 00:06:54.655
the data visualization in the
context of the query could be like.

00:06:54.655 --> 00:06:57.025
Now in this particular video,

00:06:57.025 --> 00:07:00.035
you have seen how we are
using SandDance to actually

00:07:00.035 --> 00:07:03.665
quickly visualize the data and make

00:07:03.665 --> 00:07:05.705
meaningful trends and understand

00:07:05.705 --> 00:07:07.190
what the trends are happening from

00:07:07.190 --> 00:07:09.140
the data so that it can help you in

00:07:09.140 --> 00:07:11.435
building advanced machine
learning algorithms.

00:07:11.435 --> 00:07:13.100
Thank you so much for
listening in today.

00:07:13.100 --> 00:07:27.910
[MUSIC].

