WEBVTT

00:00:00.260 --> 00:00:03.735
>> The data science life
cycle made a lot of sense.

00:00:03.735 --> 00:00:05.400
Now I know that I want to define

00:00:05.400 --> 00:00:08.235
my problem before I do anything else.

00:00:08.235 --> 00:00:11.310
So my app is a service-based app.

00:00:11.310 --> 00:00:14.205
This means that it depends
on not only usage data,

00:00:14.205 --> 00:00:17.670
but also external factors
like weather and date.

00:00:17.670 --> 00:00:20.640
For example, how people
rent bikes might

00:00:20.640 --> 00:00:24.150
look very different on a
holiday compared to a workday.

00:00:24.150 --> 00:00:25.740
I know that I want to use

00:00:25.740 --> 00:00:27.945
the anonymized data that
I collected my app,

00:00:27.945 --> 00:00:31.800
as well as the external factors
like date and weather to make

00:00:31.800 --> 00:00:34.020
better informed decisions about how

00:00:34.020 --> 00:00:37.065
and where I place my
bikes around the city.

00:00:37.065 --> 00:00:39.480
I know that this
initial analysis that

00:00:39.480 --> 00:00:41.895
we do today is just the first step.

00:00:41.895 --> 00:00:43.940
I'm going to take
what I learn and I'm

00:00:43.940 --> 00:00:46.100
going to improve what data I

00:00:46.100 --> 00:00:48.290
collect and what questions I

00:00:48.290 --> 00:00:51.920
ask so I can be more
successful in my app.

00:00:51.920 --> 00:00:54.455
So what does success look like to me?

00:00:54.455 --> 00:00:58.105
Well, for me it means more
people renting bikes.

00:00:58.105 --> 00:01:00.560
Now I could spend a lot more money on

00:01:00.560 --> 00:01:02.180
marketing but another way to do

00:01:02.180 --> 00:01:04.115
that is to make sure
that my inventory,

00:01:04.115 --> 00:01:07.525
my bikes, are in the right
place at the right time.

00:01:07.525 --> 00:01:09.800
Now that's a pretty complex matrix,

00:01:09.800 --> 00:01:13.340
so let's dial it down a
little bit more and focus on

00:01:13.340 --> 00:01:18.020
how many bikes I might need in a
certain area in the next hour,

00:01:18.020 --> 00:01:21.160
or a day, or a few minutes even.

00:01:21.160 --> 00:01:23.565
So that's going to be my question.

00:01:23.565 --> 00:01:25.760
I want to predict how
many bikes will be

00:01:25.760 --> 00:01:28.595
rented in the next hour
in a certain place,

00:01:28.595 --> 00:01:30.710
then maybe I can move my inventory to

00:01:30.710 --> 00:01:33.760
that area and more
bikes will be rented.

00:01:33.760 --> 00:01:35.970
So it looks like I've got

00:01:35.970 --> 00:01:39.015
a pretty good grasp of my
business understanding.

00:01:39.015 --> 00:01:41.600
I don't really want to mess
with my data quite yet.

00:01:41.600 --> 00:01:43.610
I want to get an initial analysis

00:01:43.610 --> 00:01:45.830
and understanding of
what's going on first.

00:01:45.830 --> 00:01:48.340
So the next step is
going to be modeling.

00:01:48.340 --> 00:01:50.360
Then I do want to try to deploy

00:01:50.360 --> 00:01:52.280
this as a web service
because I want to try it

00:01:52.280 --> 00:01:54.260
out in the wild and see if I can make

00:01:54.260 --> 00:01:57.260
any predictions and
improve this overtime.

00:01:57.260 --> 00:01:59.120
So let's check back in with

00:01:59.120 --> 00:02:01.010
Francesca so that we can
learn a little bit more

00:02:01.010 --> 00:02:03.230
about machine learning
models and maybe find

00:02:03.230 --> 00:02:06.570
out which one we should
use for our problem.

