Part I : And end-to-end Big Data solution using HDInsight anf Hive AND Predicting delay using HDInsight and Azure Machine learning
This is the first part of a 2 part session. The second part will show how to use the data for predicting airline delay. By now you've probably heard about Big Data 1.000 times or more so why a new session about big data you might ask.
Well first of all because this sessions presents an end-to-end Big Data solution which a lot of people are asking for, to see how it fits in their existing BI / DataWarehouse environment and investment. Secondly because we need to We are going to use a couple of years of airline on-time performance data which is published by the Bureau of Transportations Statistics. This session will show you how :
- Downloads and prepares several years of demo data and storing it in a dedicated Azure Blob Storage account.
- Create a storage account and container for the HDInsight Cluster.
- Create a SQL database server instance and a SQL database for the Hive Metastore. We will also setup the necessary firewall rules so that you can access the server directly.
- The HDInsight cluster is provisioned and configured for access to the sample data.
- A partitioned Hive table is created over top of the sample data
- Exploration of the result
Attend this session to Learn how to use HDInsight, use Azure Storage, Create Database, and finally explore the data and result – After this session, you will be able to use the powerfull server setup that Azure is, we'll show you how easy it is to create new servers and storage for later use.