Polybase: Hadoop Integration in SQL Server PDW V2
With the increasing role of Hadoop in capturing and processing raw, non-relational data in particular, it becomes apparent that integrating Hadoop into existing SQL Server products is essential towards one complete data platform. The Polybase project is introducing a set of features for SQL Server Parallel Data Warehouse (PDW) V2 that allows querying data in Hadoop in a seamless and fully parallelized fashion. In particular, it will be possible to query data in HDFS ‘on-the-fly’ using T-SQL statements, (b) to import data coming from HDFS into relational PDW tables (either distributed or replicated) for persistent storage, and (c) to export data residing in PDW into HDFS. This talk presents a deep dive of these Polybase features shipped in SQL Server PDW V2 as well as provides a comprehensive overview about additional Polybase features planned for the next appliance updates.