WEBVTT

00:00:00.000 --> 00:00:03.000
>> SQL Server 2019 Big
Data Clusters provides

00:00:03.000 --> 00:00:06.585
compute pools to offload
distributed query processing.

00:00:06.585 --> 00:00:10.350
UC is here to tell us all about
this today on Data Exposed.

00:00:10.350 --> 00:00:21.060
[MUSIC]

00:00:21.060 --> 00:00:25.215
>> Hi. Welcome to another episode
of Data Exposed. I'm Jeroen.

00:00:25.215 --> 00:00:27.810
Today, I'm joined by UC
to talk about compute

00:00:27.810 --> 00:00:30.690
pools in SQL Server
2019 Big Data Clusters.

00:00:30.690 --> 00:00:33.000
Hi, UC. Thanks for
joining the show again.

00:00:33.000 --> 00:00:34.155
>> Sure.

00:00:34.155 --> 00:00:36.060
>> Compute pools?

00:00:36.060 --> 00:00:36.615
>> Yeah.

00:00:36.615 --> 00:00:37.815
>> What are they?

00:00:37.815 --> 00:00:40.980
>> Compute pools. They are

00:00:40.980 --> 00:00:44.430
basically SQL Server instances
in a big data cluster,

00:00:44.430 --> 00:00:48.725
which can be used to offload your
distributed query processing.

00:00:48.725 --> 00:00:50.310
So in this picture,

00:00:50.310 --> 00:00:54.870
we see the many components in
a SQL Server Big Data Cluster.

00:00:54.870 --> 00:00:58.570
Today, we're going to look at
this compute pool over here.

00:00:58.570 --> 00:01:01.710
So what is it? It's
basically a set of

00:01:01.710 --> 00:01:03.825
SQL Server instances which are

00:01:03.825 --> 00:01:06.685
automatically brochured
inside a big data cluster,

00:01:06.685 --> 00:01:10.475
and they are available for
doing a distributed queries.

00:01:10.475 --> 00:01:11.405
>> Okay.

00:01:11.405 --> 00:01:14.030
>> This is similar to the PolyBase

00:01:14.030 --> 00:01:17.585
Scale-out Groups in SQL Server 2016.

00:01:17.585 --> 00:01:21.490
This capability now provides you

00:01:21.490 --> 00:01:25.174
out-of-the-box set of SQL instances,

00:01:25.174 --> 00:01:27.890
which can do most of the
distributed work for you.

00:01:27.890 --> 00:01:28.930
>> Okay.

00:01:28.930 --> 00:01:32.540
>> Queries can either use
the compute pool or not use

00:01:32.540 --> 00:01:35.540
the compute pool depending
on the type of query.

00:01:35.540 --> 00:01:38.570
>> What scenario would I
choose for a compute pool?

00:01:38.570 --> 00:01:40.720
>> Yeah. Great
question. So let's see.

00:01:40.720 --> 00:01:44.270
One of the common scenario is
say you have two directories in

00:01:44.270 --> 00:01:45.950
HDFS with hundreds and thousands of

00:01:45.950 --> 00:01:48.355
files and you want to join them.

00:01:48.355 --> 00:01:50.000
So in that scenario,

00:01:50.000 --> 00:01:53.390
you don't want to get all the
data over to your SQL Server.

00:01:53.390 --> 00:01:53.720
>> No.

00:01:53.720 --> 00:01:55.760
>> Which is running your application.

00:01:55.760 --> 00:01:57.785
So that's where the
compute pool helps.

00:01:57.785 --> 00:02:02.270
So it can offload most of
the work over to the HDFS

00:02:02.270 --> 00:02:03.680
and then later pull

00:02:03.680 --> 00:02:07.490
the necessary data to the compute
pool and do the join there.

00:02:07.490 --> 00:02:09.920
So this basically offloads them all,

00:02:09.920 --> 00:02:13.520
the computing world to different
SQL Server machines which can be

00:02:13.520 --> 00:02:17.545
on different nodes in
that big data cluster,

00:02:17.545 --> 00:02:19.895
and use those resources.

00:02:19.895 --> 00:02:21.590
Then the other scenarios,

00:02:21.590 --> 00:02:23.570
you are joining data from

00:02:23.570 --> 00:02:26.780
different data sources which
are partitioned differently.

00:02:26.780 --> 00:02:31.760
So there you have to unify that
partitioning at some point,

00:02:31.760 --> 00:02:33.530
and that's where the
compute pool helps.

00:02:33.530 --> 00:02:34.145
>> Okay.

00:02:34.145 --> 00:02:36.710
>> So if one table is distributed by

00:02:36.710 --> 00:02:40.465
Customer ID and another is
distributed by Order ID,

00:02:40.465 --> 00:02:43.400
and you're still
joining by Customer ID,

00:02:43.400 --> 00:02:46.590
it can do that
reconciliation for you.

00:02:46.590 --> 00:02:47.400
>> Okay.

00:02:47.400 --> 00:02:50.070
>> So that's some of the scenarios.

00:02:50.070 --> 00:02:54.259
You can also do things like
exporting data into HDFS,

00:02:54.259 --> 00:02:56.930
and that's some other place
where the compute pool can help.

00:02:56.930 --> 00:02:59.090
>> Okay. So the compute
pool will help me to

00:02:59.090 --> 00:03:01.550
parallelize, scale-out
my [inaudible].

00:03:01.550 --> 00:03:02.185
>> Yeah.

00:03:02.185 --> 00:03:05.430
>> Both reading from HDFS
and writing to HDFS at all?

00:03:05.430 --> 00:03:06.030
>> Yes.

00:03:06.030 --> 00:03:07.350
>> Cool. How does that work?

00:03:07.350 --> 00:03:09.300
I mean, can you show us a
little bit of how that works?

00:03:09.300 --> 00:03:12.605
>> Yeah. Sure. Let's go here.

00:03:12.605 --> 00:03:16.885
I am actually connected to a
SQL Server Big Data Cluster,

00:03:16.885 --> 00:03:19.655
and specifically the Mastered
instance is shown here.

00:03:19.655 --> 00:03:22.280
So we now have a new DMV,

00:03:22.280 --> 00:03:24.775
which is called compute pools.

00:03:24.775 --> 00:03:25.545
>> Okay.

00:03:25.545 --> 00:03:28.610
>> Basically, it shows
the compute pools which

00:03:28.610 --> 00:03:31.955
are provisioned and available
in the Big Data Cluster.

00:03:31.955 --> 00:03:35.960
By default, there's only one and
we show that information here.

00:03:35.960 --> 00:03:38.110
Then you can also see

00:03:38.110 --> 00:03:42.465
how many nodes are actually
there in the compute pool.

00:03:42.465 --> 00:03:44.740
This query actually shows,

00:03:44.740 --> 00:03:47.525
apart from this particular
SQL Server instance,

00:03:47.525 --> 00:03:49.100
I have two compute

00:03:49.100 --> 00:03:52.730
pool instances as shown by
these highlighted rows, right?

00:03:52.730 --> 00:03:53.405
>> Yeah.

00:03:53.405 --> 00:03:57.815
>> There are other DMVs which
you can use to basically find

00:03:57.815 --> 00:04:03.195
information about the compute
pool like how's the CPU activity,

00:04:03.195 --> 00:04:05.745
how much memory has been allocated,

00:04:05.745 --> 00:04:09.900
whether it's even available for
the query and so on, right?

00:04:09.900 --> 00:04:10.200
>> Right.

00:04:10.200 --> 00:04:12.470
>> These are information
which a DBA can

00:04:12.470 --> 00:04:15.095
use to troubleshoot compute pool.

00:04:15.095 --> 00:04:16.145
>> Sure.

00:04:16.145 --> 00:04:20.480
>> Additionally, you can
run a complex query in

00:04:20.480 --> 00:04:25.955
SQL Server which can actually
go and use the compute pool.

00:04:25.955 --> 00:04:26.270
>> Okay.

00:04:26.270 --> 00:04:27.565
>> So in this example,

00:04:27.565 --> 00:04:32.869
I'm joining a local table in SQL
Server with some data in HDFS,

00:04:32.869 --> 00:04:37.070
and I also have a table in
Oracle, which I'm querying.

00:04:37.070 --> 00:04:40.265
So you can basically run a query and

00:04:40.265 --> 00:04:42.290
the query optimizer
automatically figures

00:04:42.290 --> 00:04:44.570
out how to use the compute pool.

00:04:44.570 --> 00:04:47.630
In this case, it's going to
use the computer pool for

00:04:47.630 --> 00:04:50.930
your HDFS table and

00:04:50.930 --> 00:04:54.490
the rest of the data is
all joined and returned.

00:04:54.490 --> 00:04:57.030
That's an example
where a compute pool

00:04:57.030 --> 00:05:00.060
transparently works to
get the results for you.

00:05:00.060 --> 00:05:01.755
>> Cool. That's looks really good.

00:05:01.755 --> 00:05:04.220
Basically, I can write this query.

00:05:04.220 --> 00:05:07.040
I now can trust the
compute pool will do step

00:05:07.040 --> 00:05:10.010
in where it makes sense to
optimize the performance, correct?

00:05:10.010 --> 00:05:10.535
>> Yes.

00:05:10.535 --> 00:05:13.115
>> Awesome. Well, thanks
a lot for sharing.

00:05:13.115 --> 00:05:14.015
>> Sure.

00:05:14.015 --> 00:05:15.500
>> I hope this was useful.

00:05:15.500 --> 00:05:20.150
Please like or subscribe
to the video and comment.

00:05:20.150 --> 00:05:22.340
Hope to see you next time.
Thanks for watching.

00:05:22.340 --> 00:05:36.910
[MUSIC]

