In this episode of Data Exposed Scott welcomes to the show Casey Karst, a Program Manager in the SQL Server group, to talk about PolyBase scale-out groups in SQL Server 2016. Building on this show with Sahaj Saini which introduced the topic of PolyBase in SQL Server 2016, Casey introduces us to the concept and ability of scaling out PolyBase.
At the 1:00 mark Casey jumps right in and via a single slide gives us the insight into why scale-out groups are being introduced. He explains that having a large amount of data spread across an large Hadoop Cluster could cause a bottleneck for a single node SQL instance. Thus, the idea of a PolyBase scale-out group was created to be able to distribute a query and results across multiple SQL instances. Casey explains the architecture of the scale-out group (head node and compute nodes), how to set it up, and how the group works together to supply the necessary computation resources.
At the 4:55 mark is demo time. Casey jumps right in and shows us how queries are distributed and handled by the scale-out group, and what happens under the covers and the steps that a query goes through when executed. Casey also demos the concept of an External Pushdown query, which converts a T-SQL query into a map/reduce job, which pushes the job to Hadoop to take advantage of the Hadoop resources to do all the computation resulting in a smaller resultset to push back to SQL. This scenario is great when dealing with very large amounts of data.Casey also shows us some of the cool new scale-out DMVs and how to use these DMVs for query processing insights, such as which step of the query took the longest, or to find information about Map jobs, or find the execution progress of a query, and more! Great video!