New Services in the Cortana Analytics Suite: Spark, Azure Data Lake and Data Catalog
This session introduces the newest services in the Cortana Analytics family: Apache Spark™ is a powerful open source processing engine built around speed, ease of use and sophisticated analytics. It includes support for interactive queries (SQL), advanced analytics (e.g. ML) and streaming over large datasets. This session will show you how to use Apache Spark as part of Cortana Analytics. Azure Data Lake is a hyper-scale data repository designed for big data analytics workloads. It provides a single place to store any type of data in its native format. In this session, we will show how the HDFS compatibility of Azure Data Lake as a Hadoop File System enables all Hadoop workloads including Azure HDInsight, Hortonworks and Cloudera. Further, we will focus on the key capabilities of the Azure Data Lake that make it an ideal choice for storing, accessing and sharing data for a wide range of analytics applications. The Azure Data Catalog is an enterprise-wide metadata catalog that enables self-service data source discovery. Data Catalog is a fully managed service that stores, describes, indexes, and provides information on how to access any registered data source in your organization. This session presents an overview of the Data Catalog and how – by using it to register, enrich, discover, understand and consume data sources – you can close the gap between those seeking information and those creating it.