WEBVTT

00:00:02.870 --> 00:00:07.590
>> Welcome back to Developers
Introduction to Data Science.

00:00:07.590 --> 00:00:09.450
Probably many of you are already

00:00:09.450 --> 00:00:12.120
familiar with the tradition
programming logic.

00:00:12.120 --> 00:00:13.740
So let's try to understand

00:00:13.740 --> 00:00:15.360
the machine learning in leveraging

00:00:15.360 --> 00:00:17.220
the tradition programming logic.

00:00:17.220 --> 00:00:18.640
This is an image of that,

00:00:18.640 --> 00:00:20.670
that shows you the difference in how

00:00:20.670 --> 00:00:24.580
tradition programming is created
versus a machine learning model.

00:00:24.580 --> 00:00:28.290
Traditional programming uses
aesthetically program logic

00:00:28.290 --> 00:00:30.405
to get the specific outputs.

00:00:30.405 --> 00:00:32.780
Static results are generated based on

00:00:32.780 --> 00:00:35.075
the program logic in the algorithm.

00:00:35.075 --> 00:00:37.130
Your machine learning
and specifically

00:00:37.130 --> 00:00:39.290
in a supervised machine learning,

00:00:39.290 --> 00:00:41.810
you have your data which are

00:00:41.810 --> 00:00:44.090
your features and then you have

00:00:44.090 --> 00:00:47.005
it in outputs which are your labels.

00:00:47.005 --> 00:00:50.720
That goes through the
algorithm computation

00:00:50.720 --> 00:00:52.820
and the model is created,

00:00:52.820 --> 00:00:55.460
we can reach at the basic
level user function using

00:00:55.460 --> 00:00:58.580
a mathematical function
created with the data.

00:00:58.580 --> 00:01:02.375
This is called training your
machine learning model.

00:01:02.375 --> 00:01:06.710
In machine learning, there
are two different categories,

00:01:06.710 --> 00:01:09.470
one is called supervised learning

00:01:09.470 --> 00:01:12.635
and the other one is
unsupervised learning.

00:01:12.635 --> 00:01:14.870
Supervised means that we are giving

00:01:14.870 --> 00:01:17.165
the examples of both of the data,

00:01:17.165 --> 00:01:19.280
use it to predict and the answer.

00:01:19.280 --> 00:01:22.070
For example, is a dog or is a cat.

00:01:22.070 --> 00:01:25.130
We will be giving the
model tag image of

00:01:25.130 --> 00:01:28.605
a feature category for
dogs and for cats.

00:01:28.605 --> 00:01:32.445
We would telling him
what's a cat and a dog is.

00:01:32.445 --> 00:01:36.600
Then the model would learn
how to tag data provided.

00:01:36.600 --> 00:01:40.760
A supervised learning is when we
give it an untagged dataset to

00:01:40.760 --> 00:01:42.635
learn from without giving it

00:01:42.635 --> 00:01:45.590
the answer of what
we want it to learn.

00:01:45.590 --> 00:01:49.295
As you can see, under the
supervised machine learning,

00:01:49.295 --> 00:01:52.460
there are two methods:
regression and classification.

00:01:52.460 --> 00:01:55.235
While under the unsupervised
machine learning,

00:01:55.235 --> 00:01:57.845
there are three different
methods: clustering,

00:01:57.845 --> 00:02:00.905
anomaly detection,
and recommendation.

00:02:00.905 --> 00:02:02.810
Now that we have a bit of

00:02:02.810 --> 00:02:05.210
understanding of what
machine learning is,

00:02:05.210 --> 00:02:08.825
let's look at the model
building process at high level.

00:02:08.825 --> 00:02:11.540
As you can see, there
are four main stages.

00:02:11.540 --> 00:02:14.090
We said there is a prepare
your data, train your model,

00:02:14.090 --> 00:02:17.600
test your model, and
operationalize your model.

00:02:17.600 --> 00:02:18.875
A prepare your data,

00:02:18.875 --> 00:02:21.305
and this is a stage you need to find,

00:02:21.305 --> 00:02:22.985
select and create a data,

00:02:22.985 --> 00:02:27.620
applying preprocessing techniques
and also fill gaps in your data.

00:02:27.620 --> 00:02:33.455
The train model means that
you have to give your model

00:02:33.455 --> 00:02:36.590
specific data so that your
model is going to learn it

00:02:36.590 --> 00:02:40.175
from these data to predict
as something in the future.

00:02:40.175 --> 00:02:43.010
This is a very iterative task

00:02:43.010 --> 00:02:45.990
because you may need to
change the data in the model

00:02:45.990 --> 00:02:48.560
until you think that you have

00:02:48.560 --> 00:02:52.190
a good accounting data
for the production stage.

00:02:52.190 --> 00:02:54.020
Test model, now that you have

00:02:54.020 --> 00:02:57.200
a model that you think is
going to perform very well,

00:02:57.200 --> 00:02:59.800
you can test it to
weigh the new data.

00:02:59.800 --> 00:03:01.980
Finally, operationalize your model.

00:03:01.980 --> 00:03:05.030
Now it's the time to operationalize
your model so that you

00:03:05.030 --> 00:03:09.360
can consume it from
different applications.

