Building a Kickass Data Science Pipeline with Azure Batch, Ubuntu, and Microsoft R Server

Download this episode

Download Video

Description

A typical data science pipeline involves feature engineering, model building (often with cross validation and hyper-parameter optimization) and then production deployment of those models. There are many ways to solve this problem but the cloud plays a key role in almost all of them. In this session we'll use a range of open source tools on Microsoft Azure to build out our pipeline. This includes Microsoft R Server, Azure Batch, Azure Functions, Ubuntu and a sprinkling of Python. This example is based on a real-world production system but it has been ported across to a sample dataset; you'll be able to take all of the code and scripts away to use yourself.

Day:

3

Level:

Level 300

Session Type:

Breakout

Code:

M364

Room:

Marlborough Room (SKYCITY)

Embed

Format

Available formats for this video:

Actual format may change based on video formats available and browser capability.

    The Discussion

    Comments closed

    Comments have been closed since this content was published more than 30 days ago, but if you'd like to send us feedback you can Contact Us.