Making a team of data engineers and data scientists productive is a challenging task. The size of the data in "big data" problems is the first great hindrance to productivity. Apache Spark provides a great foundation for the solution to this problem by offering interactive compute engine but it is not sufficient in itself. We review how a set of open source tools including Jupyter and Livy can be combined with advanced resource management and elasticity of Azure cloud to provide comprehensive interactive platform for big data.
Code:
BRK3226
Room:
C108 - C109