Build with an Azure free account. Get USD200 credit for 30 days and 12 months of free services.

Start free today

Unlock petabyte-scale datasets in Azure with aggregations in Power BI

Sign in to queue

The Discussion

  • User profile image
    kbaig

    Hi Guys,

    Thanks for the great demo and a great feature. Queries that are not cached are getting processed by spark as mentioned but can you share more details around how a 23 node spark cluster fits into this eco-system ?

    Thanks

  • User profile image
    wadecb

    @kbaig:thanks for the feedback. The spark cluster is optional. From the Power BI side it works the same way if it's HDI Spark, Azure SQL Data Warehouse, DataBricks and various other sources in Azure (that support DirectQuery). The setup and optimization of these systems is dependent on the system itself and is standard for query perf tuning on that system - there is nothing special about setting up/query optimizing these systems that is different when using aggregations

  • User profile image
    Danaraj

    This is super awesome, Christian!

    Just curios if there is a plan for aggregations in Analysis Services Tabular?

  • User profile image
    Ezra Gabay

    Christian! You are brilliant we just need to figure out how to travel to Mars and back combining all of NASA data. I can setup that appointment if need be as I know a few smart people there. All the best! I will be using this for a few of our companies.
    Ezra Gabay

  • User profile image
    wadecb

    @Danaraj: we are currently focusing on going in the other direction: bringing the Analysis Services scalability, manageability, ALM, debugging, etc. to Power BI

  • User profile image
    wadecb

    @Ezra Gabay: Thank you so much Ezra! Glad to be of service! We're all about pushing the boundaries :)

  • User profile image
    Danaraj

    @wadecb: That's interesting, looking forward for what's coming up next. Thanks Christian, amazing work.

  • User profile image
    aljj

    re: Spark query. In order for this query to complete in reasonable time over big data, the data has to be partitioned. But there are limited way you can partition data in Spark (not more than 100 partitions).

    So can you explain a bit how the data is partition bucketed?

  • User profile image
    wadecb

    @aljj:it is stored in parquet and coalesced into 200 random chunks of rows

Add Your 2 Cents