Announcing Apache Spark on Azure HDInsight

Play Announcing Apache Spark on Azure HDInsight


Scott talks to Asad Khan about the addition of Apache Spark on Azure HDInsight. Apache Spark is a unified, open source, parallel data processing framework for Big Data Analytics. Spark brings together batch processing, real-time processing, stream analytics, machine learning, and interactive SQL and Azure makes Spark "Software as a Service." It's a great time for you to jump into the world of Big Data on Azure.





Download this episode

The Discussion

  • User profile image

    Great stuff.

    Max number of nodes are limited to 32?



  • User profile image

    The number of nodes are not limited to 32. You can use any number of nodes as long as you have sufficient Azure quota.




  • User profile image

    @Asadk:how can I know what is my quota ? Great stream

  • User profile image

    Great introduction.  I was wondering why Asad flipped over to Jupyter for the machine learning instead of using %pyspark in Zeppelin.  After creating a Spark cluster, I discovered that MS removed a number of the default Zeppelin interpreters such as PySpark, Angular, etc!  That creates an odd situation of flipping between notebooks rather than staying in Zeppelin if you want to use MLLib or other Spark features with Python instead of Scala.

  • User profile image

    @Sneaksys: If you go to any HDInsight cluster in Azure portal and click on 'Dashboard' (top tabs); you will see the quota summary.

  • User profile image

    @DrEldies: I totally agree with your comment of flipping between notebooks. Unfortunately there is no single notebook which works really well across all the scenarios Spark supports. Zeppelin is good in few things and Jupyter is good in others. For example pyspark support in Zeppelin is very new and have bunch of issues. We are working with the open source community to have a single notebook that works really well across all scenarios.

  • User profile image

    Could i get the code to create the tables and the code for both the scala and the python ml?

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.