WEBVTT

00:00:09.680 --> 00:00:10.780
>> [MUSIC].

00:00:10.780 --> 00:00:12.750
>> Hi. I'm Vicki Harp with
the SQL Server product team.

00:00:12.750 --> 00:00:15.870
I'm here today to show you
Notebooks and Azure Data Studio.

00:00:15.870 --> 00:00:18.825
So Notebooks are a concept in

00:00:18.825 --> 00:00:20.660
Data science which has been

00:00:20.660 --> 00:00:22.940
used to do a lot of
data visualization,

00:00:22.940 --> 00:00:25.010
data exploration and data work,

00:00:25.010 --> 00:00:26.855
primarily in the Python language.

00:00:26.855 --> 00:00:28.760
When people are talking
about Notebooks,

00:00:28.760 --> 00:00:30.485
a lot of times we're talking
about Jupiter Notebooks.

00:00:30.485 --> 00:00:32.660
So the implementation of Notebooks

00:00:32.660 --> 00:00:34.730
that we have in Azure Data Studio is

00:00:34.730 --> 00:00:37.070
a Jupiter Notebook with a
custom front-end that's

00:00:37.070 --> 00:00:40.430
better tailored to fit within the
Azure Data Studio experience.

00:00:40.430 --> 00:00:45.450
So first, I'm going to show
you a simple Python Notebook.

00:00:45.450 --> 00:00:46.760
This is a Notebook that has

00:00:46.760 --> 00:00:49.250
Python language and
you can see we've got

00:00:49.250 --> 00:00:51.620
a section up here which is

00:00:51.620 --> 00:00:54.395
a human-readable text and a
section down here that has code.

00:00:54.395 --> 00:00:56.149
So if I hit run on this,

00:00:56.149 --> 00:00:58.010
this particular code calls out to

00:00:58.010 --> 00:01:00.755
the Internet and pulls down a
random photograph of a dog.

00:01:00.755 --> 00:01:03.710
To show you around the
Notebook experience,

00:01:03.710 --> 00:01:05.390
we have a picker here,

00:01:05.390 --> 00:01:06.770
we can choose the language.

00:01:06.770 --> 00:01:08.630
So in this case, I'm
running Python 3,

00:01:08.630 --> 00:01:10.475
and I'm running it
on my local machine.

00:01:10.475 --> 00:01:13.460
I could instead move over to
the language SQL and then

00:01:13.460 --> 00:01:16.430
pick which of MySQL server
machines that I want to attach to.

00:01:16.430 --> 00:01:18.605
I've got MySQL servers listed here.

00:01:18.605 --> 00:01:21.830
So to take it a little bit further,

00:01:21.830 --> 00:01:24.230
I'll show you an example
of how you might use this.

00:01:24.230 --> 00:01:28.025
So here I've got a SQL Server
2019 Big Data Cluster which

00:01:28.025 --> 00:01:32.555
is SQL Server running in Kubernetes
with both SQL and Spark.

00:01:32.555 --> 00:01:34.490
So here in my HDFS section,

00:01:34.490 --> 00:01:38.040
I've gotten this directory
full of CSV files.

00:01:38.040 --> 00:01:40.890
If I take a look at that, preview it,

00:01:40.890 --> 00:01:42.260
you can see that it's a lot of

00:01:42.260 --> 00:01:44.780
information about dog
license information.

00:01:44.780 --> 00:01:46.970
So this is actually the Data Dump of

00:01:46.970 --> 00:01:48.320
the Allegheny County in Pennsylvania

00:01:48.320 --> 00:01:51.230
dog licenses for the
last several years.

00:01:51.230 --> 00:01:55.775
So using the Spark
language and connecting

00:01:55.775 --> 00:02:00.440
to Python using Spark on the
SQL Server Big Data Cluster.

00:02:00.440 --> 00:02:02.045
I'm able to run,

00:02:02.045 --> 00:02:05.780
code to read those files
directly out of CSV.

00:02:05.780 --> 00:02:07.760
Then I can do some analysis.

00:02:07.760 --> 00:02:09.890
So in this case, I'm going to
get the count of the rows.

00:02:09.890 --> 00:02:15.645
I'm going to list the distinct
list of just the names,

00:02:15.645 --> 00:02:17.215
so we have 25,000 names.

00:02:17.215 --> 00:02:20.825
Then I'm going to pull a
random name out of that hat.

00:02:20.825 --> 00:02:22.550
So we have a name here.

00:02:22.550 --> 00:02:25.850
Now, if I was using a Spark,

00:02:25.850 --> 00:02:29.120
if I was very used to using
Python, that would be great.

00:02:29.120 --> 00:02:30.770
But if I'm more of a SQL person,

00:02:30.770 --> 00:02:33.320
then maybe I want to connect
to that same data-set,

00:02:33.320 --> 00:02:36.040
I will run cluster
using the SQL language.

00:02:36.040 --> 00:02:41.420
Here, I've created external
table over that file format.

00:02:41.420 --> 00:02:43.870
This is something that you
can do using a wizard.

00:02:43.870 --> 00:02:45.250
I'm going to go here,

00:02:45.250 --> 00:02:46.930
create external table from CSV files.

00:02:46.930 --> 00:02:50.315
Then, that would create this
external table that I'm using.

00:02:50.315 --> 00:02:52.415
But in this case, I've
done it using code.

00:02:52.415 --> 00:02:56.500
If I hit run cells,

00:02:56.500 --> 00:02:59.765
you can see that I'm also using SQL

00:02:59.765 --> 00:03:03.625
to access the same data that
I just accessed using Spark.

00:03:03.625 --> 00:03:07.410
Now, these files are
saved as IPYNB files,

00:03:07.410 --> 00:03:10.970
which I can then share with my
colleagues and I can run again.

00:03:10.970 --> 00:03:14.465
So in the case that I
wanted to run this today,

00:03:14.465 --> 00:03:15.995
send it over to my colleague,

00:03:15.995 --> 00:03:18.950
they could open it
on their instant and

00:03:18.950 --> 00:03:20.480
attach it to their own cluster or

00:03:20.480 --> 00:03:22.535
to my same cluster and run it again.

00:03:22.535 --> 00:03:24.800
We're also using Notebooks
in a few other ways.

00:03:24.800 --> 00:03:27.485
Here, we have something
called Jupiter books.

00:03:27.485 --> 00:03:30.710
This is a list of basically Notebooks

00:03:30.710 --> 00:03:34.130
that had been compiled together
into a chapter page format.

00:03:34.130 --> 00:03:37.775
So here we have a SQL
Server ML services course,

00:03:37.775 --> 00:03:40.985
which is shown through Notebooks.

00:03:40.985 --> 00:03:43.070
So we've got all of the
documentation present.

00:03:43.070 --> 00:03:46.445
We've got the code samples
here and then you can,

00:03:46.445 --> 00:03:47.780
at the end of each chapter,

00:03:47.780 --> 00:03:49.730
either hit the next
button to go to the next

00:03:49.730 --> 00:03:52.400
one or use this chapter
browser to pick it.

00:03:52.400 --> 00:03:54.965
We've also got the
supportability book

00:03:54.965 --> 00:03:57.255
for SQL Server 2019
Big Data Clusters.

00:03:57.255 --> 00:03:59.210
If I brought up the
command pilot and pick

00:03:59.210 --> 00:04:01.160
this Jupiter books 2019 guide,

00:04:01.160 --> 00:04:03.035
that's what I use to open this up.

00:04:03.035 --> 00:04:05.960
In here, we have all of the
information that you might

00:04:05.960 --> 00:04:08.885
need in order to support
your new Big Data Cluster,

00:04:08.885 --> 00:04:11.585
a lot of the Kubernetes Commands,
the cluster information.

00:04:11.585 --> 00:04:13.565
This is something that you can run,

00:04:13.565 --> 00:04:14.735
you can send to someone else,

00:04:14.735 --> 00:04:17.065
and you can save and
do what you want with.

00:04:17.065 --> 00:04:21.140
So we are very excited about
Notebooks in Azure Data Studio.

00:04:21.140 --> 00:04:23.210
We really think that this
is something that will be

00:04:23.210 --> 00:04:26.210
very useful to our community.

00:04:26.210 --> 00:04:28.040
We're really interested
in finding out how

00:04:28.040 --> 00:04:29.765
you're using it and
what you want from us.

00:04:29.765 --> 00:04:32.800
So if you are interested in
learning more, go to GitHub,

00:04:32.800 --> 00:04:34.610
look for Azure Data Studio and you

00:04:34.610 --> 00:04:37.120
can open up feature
request inter bugs.

00:04:37.120 --> 00:04:39.470
That's also where you
can download both the

00:04:39.470 --> 00:04:51.991
stable and the insiders addition.
Thank you so much for your time.

00:04:51.991 --> 00:04:57.040
>> [MUSIC]

