WEBVTT

00:00:02.180 --> 00:00:07.505
>> Welcome back to developers
introduction to Data Science.

00:00:07.505 --> 00:00:10.390
Now, let's learn more
about how you can pick

00:00:10.390 --> 00:00:11.680
the best model for

00:00:11.680 --> 00:00:15.100
your data science scenario with
the automated machine learning.

00:00:15.100 --> 00:00:17.800
Understanding the
predictive power of a set

00:00:17.800 --> 00:00:20.420
of features with respect
to a dependent variable,

00:00:20.420 --> 00:00:21.970
is a very tricky problem,

00:00:21.970 --> 00:00:23.770
and that there is no universal metric

00:00:23.770 --> 00:00:26.155
which can tell you how to do that.

00:00:26.155 --> 00:00:28.120
So the answer to the question,

00:00:28.120 --> 00:00:31.180
which algorithm should I
use is always, it depends.

00:00:31.180 --> 00:00:33.400
It depends on the size, the quality,

00:00:33.400 --> 00:00:35.170
and the nature of your data,

00:00:35.170 --> 00:00:39.205
it depends on what you want to
do with that specific answer.

00:00:39.205 --> 00:00:41.650
Finally, the answer to the question,

00:00:41.650 --> 00:00:45.119
which parameters I should use
is also very challenging.

00:00:45.119 --> 00:00:47.090
As you know, hyperparameters are

00:00:47.090 --> 00:00:50.435
higher level parameters
that cannot be learned

00:00:50.435 --> 00:00:53.030
directly from the data using

00:00:53.030 --> 00:00:56.215
gradient descent or other
optimization algorithm.

00:00:56.215 --> 00:00:59.600
They describe the structural
information about a model that

00:00:59.600 --> 00:01:03.380
must be decided before
fitting a model parameters.

00:01:03.380 --> 00:01:06.125
Model parameters setting
and searching before

00:01:06.125 --> 00:01:08.860
optimal parameters values based on

00:01:08.860 --> 00:01:13.065
learning and experience can
be also very time consuming.

00:01:13.065 --> 00:01:16.085
Different estimators
are better suited

00:01:16.085 --> 00:01:19.160
for different types of data
and also different problems.

00:01:19.160 --> 00:01:22.610
So often I would say that
the hardest part of solving

00:01:22.610 --> 00:01:24.170
a machine learning problem can be

00:01:24.170 --> 00:01:27.175
finding the right
estimator for your job.

00:01:27.175 --> 00:01:31.100
That's why I very often use
automated machine learning,

00:01:31.100 --> 00:01:34.070
which is the process of
automating the time consuming

00:01:34.070 --> 00:01:37.295
and iterative task of machine
learning model development.

00:01:37.295 --> 00:01:41.510
Automated machine learning
takes uncertainty into account,

00:01:41.510 --> 00:01:43.940
incorporating their
probabilistic model to

00:01:43.940 --> 00:01:47.090
determining the best
pipeline to try next.

00:01:47.090 --> 00:01:51.350
This approach allows
automated machine learning to

00:01:51.350 --> 00:01:55.925
explore the most promising
possibilities without wasting time.

00:01:55.925 --> 00:01:58.080
Now, let's have a closer look of

00:01:58.080 --> 00:02:01.790
the different automated
machine learning capabilities.

00:02:01.790 --> 00:02:04.670
First of all, with automated
machine learning you need to

00:02:04.670 --> 00:02:08.255
identify the machine learning
problem that you want to solve.

00:02:08.255 --> 00:02:11.914
These can be classification,
forecasting, or regression.

00:02:11.914 --> 00:02:14.060
Then you have to specify the source

00:02:14.060 --> 00:02:16.600
and the format of the
labeled training data.

00:02:16.600 --> 00:02:20.420
This is going to be NumPy
arrays or a Pandas DataFrame.

00:02:20.420 --> 00:02:22.640
Finally, you need to configure

00:02:22.640 --> 00:02:25.055
the computer target
for model training,

00:02:25.055 --> 00:02:27.064
such as your local computer,

00:02:27.064 --> 00:02:30.380
Azure machine learning
compute, remote VM,

00:02:30.380 --> 00:02:32.825
so our Azure Databricks, for example,

00:02:32.825 --> 00:02:35.240
and during training the
Azure Machine Learning

00:02:35.240 --> 00:02:37.700
serve basically to a number of

00:02:37.700 --> 00:02:42.635
imparallel pipelines that try
different algorithms and parameters.

00:02:42.635 --> 00:02:45.140
It will stop only once it hits

00:02:45.140 --> 00:02:49.345
the exit criteria that you
define in the experiment.

00:02:49.345 --> 00:02:54.330
The AutoMLConfig class represents
a configuration for submitting

00:02:54.330 --> 00:02:56.500
an automated machine learning

00:02:56.500 --> 00:02:59.310
experiment in Azure Machine Learning.

00:02:59.310 --> 00:03:02.710
This configuration object
contains and proceeds to

00:03:02.710 --> 00:03:06.245
the parameters for configuring
the experiment around,

00:03:06.245 --> 00:03:10.145
as well as the training data that
needs to be used at the runtime.

00:03:10.145 --> 00:03:14.070
To learn more, please see
aka.ms/AutoMLConfig-Class.

