Apache Spark is an open source processing framework that runs large-scale data analytics applications. Built on an in-memory compute engine, Spark is known for high performance querying on big data. It leverages a parallel data processing framework that persists data in-memory and disk if needed. This allows Spark to deliver both 100x faster speed and a common execution model to various tasks like ETL, batch, interactive queries, and others on data in HDFS. The Azure cloud makes Apache Spark easy and cost effective to deploy with no hardware to buy, no software to configure, a full notebook experience to author compelling narratives, and integration with third party BI tools.
Code:
DAT323
Room:
Room 5 & 6