WEBVTT

00:00:00.000 --> 00:00:02.610
>> It's finally time to
play around with our data.

00:00:02.610 --> 00:00:04.830
I've got a local CSV file with all of

00:00:04.830 --> 00:00:08.235
my data and I need to make sure
that that gets up into the Cloud.

00:00:08.235 --> 00:00:10.740
Additionally, I need
to prepare my data.

00:00:10.740 --> 00:00:12.750
I know that what I want to do is

00:00:12.750 --> 00:00:15.865
predict how many bikes will
be rented on a certain day.

00:00:15.865 --> 00:00:19.770
Which means I need to prepare my
data for a time series forecasts.

00:00:19.770 --> 00:00:22.790
As always, there is
additional documentation on

00:00:22.790 --> 00:00:26.350
the screen and then a description
down below. Let's get started.

00:00:26.350 --> 00:00:30.750
First, I need to make sure
my dataset is accessible.

00:00:31.600 --> 00:00:33.980
I'm going to add it
to the same folder

00:00:33.980 --> 00:00:36.420
that my Jupyter notebook is in.

00:00:36.880 --> 00:00:40.174
I'm going to upload this dataset

00:00:40.174 --> 00:00:42.955
to my Machine Learning
Datastore up on Azure.

00:00:42.955 --> 00:00:45.355
I'll grab my CSV file,

00:00:45.355 --> 00:00:47.825
upload it to the dataset folder,

00:00:47.825 --> 00:00:50.900
overwrite any data
that exists already,

00:00:50.900 --> 00:00:53.000
and I want to show the progress of

00:00:53.000 --> 00:00:56.045
this process within
Visual Studio Code.

00:00:56.045 --> 00:00:58.640
Great, that looks like it worked.

00:00:58.640 --> 00:01:03.010
Next, I need to create
a dataset object,

00:01:03.010 --> 00:01:06.910
which is a class of
azureml-core and I need to

00:01:06.910 --> 00:01:08.935
format my date in a
way that I'll make it

00:01:08.935 --> 00:01:11.755
easier for me to sort my data.

00:01:11.755 --> 00:01:15.050
I need to import some classes.

00:01:15.250 --> 00:01:18.220
Now I need to figure out
which column I'm going

00:01:18.220 --> 00:01:21.295
to use to predict which other column.

00:01:21.295 --> 00:01:22.855
Before I decide this,

00:01:22.855 --> 00:01:25.210
I'm going to open up the CSV file

00:01:25.210 --> 00:01:27.970
and take a look at
the data that I have.

00:01:27.970 --> 00:01:30.685
I noticed that in here I've got date,

00:01:30.685 --> 00:01:34.540
season, year, month,
which day of the week,

00:01:34.540 --> 00:01:37.870
what the weather is, the
temperature, the humidity,

00:01:37.870 --> 00:01:41.160
the wind speed, and
I also have count,

00:01:41.160 --> 00:01:43.805
which is the number of bikes
that have been rented.

00:01:43.805 --> 00:01:49.650
I think what I'll do is I'll
use date to predict count.

00:01:50.000 --> 00:01:52.710
Let's define that over here.

00:01:52.710 --> 00:01:55.010
Now I've got my time column name,

00:01:55.010 --> 00:01:58.520
which will be used to predict
my target column name.

00:01:58.520 --> 00:02:00.890
Next, I'm going to create

00:02:00.890 --> 00:02:03.650
a local variable called
dataset that'll keep

00:02:03.650 --> 00:02:05.570
track of my data and I'm going to

00:02:05.570 --> 00:02:08.360
convert that dataset
into a Pandas DataFrame.

00:02:08.360 --> 00:02:11.060
Notice that I'm calling take 5,

00:02:11.060 --> 00:02:14.630
which will print out the first
five rows of this dataframe.

00:02:14.630 --> 00:02:17.150
We can verify that
these rows are correct.

00:02:17.150 --> 00:02:19.920
It's January 1st, 2011,

00:02:19.920 --> 00:02:23.605
the temperature was 0.344167

00:02:23.605 --> 00:02:24.920
and we can take a look at

00:02:24.920 --> 00:02:28.200
our CSV file and see that
it was still January 1st,

00:02:28.200 --> 00:02:36.440
2011, and that the weather was
or the temp was 0.33344167.

00:02:36.440 --> 00:02:39.620
That's looking right. Now we
have a link between our data

00:02:39.620 --> 00:02:43.890
stored in Azure and our local
Visual Studio Code environment.

