WEBVTT

00:00:03.200 --> 00:00:08.100
>> Welcome back to developers
introduction to data science.

00:00:08.100 --> 00:00:12.510
A common question show that
many customers, developers,

00:00:12.510 --> 00:00:15.030
but also data scientists ask me is,

00:00:15.030 --> 00:00:18.690
which machine learning
algorithm should I use?

00:00:18.690 --> 00:00:24.465
I always answer, it depends
on many different factors.

00:00:24.465 --> 00:00:27.300
But most importantly, the
algorithm that you select depends

00:00:27.300 --> 00:00:30.750
on one aspect of your
data science scenario.

00:00:30.750 --> 00:00:32.880
What do you want to
do with your data?

00:00:32.880 --> 00:00:36.515
This is the most important questions
that you should ask yourself.

00:00:36.515 --> 00:00:39.200
Specifically, another
important question is,

00:00:39.200 --> 00:00:41.870
what is the business
questions that you want to

00:00:41.870 --> 00:00:45.305
answer by learning
from your past data?

00:00:45.305 --> 00:00:48.679
Machine learning has many
different algorithms,

00:00:48.679 --> 00:00:51.740
and each different algorithm
can help you achieve

00:00:51.740 --> 00:00:55.200
a different goal and answer
a different question.

00:00:55.200 --> 00:00:57.615
Let's see some of these.

00:00:57.615 --> 00:01:01.755
The first one is predict
between different categories.

00:01:01.755 --> 00:01:04.940
Here we have two different
types of methods,

00:01:04.940 --> 00:01:07.160
and we have the two-class
classification,

00:01:07.160 --> 00:01:08.960
which is great at answering

00:01:08.960 --> 00:01:12.315
question that are
two-choice questions,

00:01:12.315 --> 00:01:15.150
like yes or no, true or false.

00:01:15.150 --> 00:01:18.545
Then we have the
multiclass classification,

00:01:18.545 --> 00:01:21.110
which is great at answering
complex questions

00:01:21.110 --> 00:01:24.355
with multiple possible options.

00:01:24.355 --> 00:01:27.560
You can also use machine
learning on your data to

00:01:27.560 --> 00:01:30.405
discover partners in your data.

00:01:30.405 --> 00:01:32.450
Here we have three different types of

00:01:32.450 --> 00:01:34.805
method such as a recommenders,

00:01:34.805 --> 00:01:37.400
that are great at predicting what

00:01:37.400 --> 00:01:40.220
someone will be interested
in, in the future.

00:01:40.220 --> 00:01:46.085
Clustering is great at separating
similar data points into groups.

00:01:46.085 --> 00:01:50.030
Finally, anomaly detection,
which is great at

00:01:50.030 --> 00:01:55.450
identifying and predicting
rare or unusual data points.

00:01:55.450 --> 00:01:58.715
Another thing that you can do
with your data is actually

00:01:58.715 --> 00:02:02.405
understanding what
image is presenting,

00:02:02.405 --> 00:02:05.045
and also understanding
the natural language.

00:02:05.045 --> 00:02:08.960
Here we have methods such
as image classification,

00:02:08.960 --> 00:02:13.130
that is able to identify
image with a neural networks.

00:02:13.130 --> 00:02:16.325
Also text analytics, that is able to

00:02:16.325 --> 00:02:20.660
derives high-quality
information from text.

00:02:20.660 --> 00:02:23.150
Finally, if you need to predict

00:02:23.150 --> 00:02:26.285
the results based on the
relationship between the values,

00:02:26.285 --> 00:02:29.930
you can use a different
type of regressions method.

00:02:29.930 --> 00:02:34.310
With regression, you're
generally making forecast by

00:02:34.310 --> 00:02:39.580
estimating the relationship
between your values.

00:02:39.580 --> 00:02:44.715
For this specific use case
that Sara is trying to solve,

00:02:44.715 --> 00:02:46.980
I think that we need to predict

00:02:46.980 --> 00:02:50.395
how many bikes will be
rented in the next hour.

00:02:50.395 --> 00:02:53.660
I really think that the
prediction, most importantly,

00:02:53.660 --> 00:02:58.460
regression is the right method
that we should use because we

00:02:58.460 --> 00:03:00.620
really want to predict an outcome

00:03:00.620 --> 00:03:04.290
based on the relationship
between the values.

