The following is a guest post by Saptak Sen, Senior Product Manager at Hortonworks. Prior to this, Sen was the Senior Product Manager for High Performance Computing and Technical Computing at Microsoft.
We're excited to announce the general availability of Hortonworks Sandbox for Hortonworks Data Platform 2.2 on Azure.
Hortonworks Sandbox is already a very popular environment in which developers, data scientists, and administrators can learn and experiment with the latest innovations in the Hortonworks Data Platform.
The hundreds of innovations span Hadoop, Kafka, Storm, Hive, Pig, YARN, Ambari, Falcon, Ranger, and other components of which HDP is composed. Now you can deploy this environment for your learning and experimentation in a few clicks on Microsoft Azure.
Follow the guide to Getting Started with Hortonworks Sandbox with HDP 2.2 on Azure to set up your own dev-ops environment on the cloud in a few clicks.
We also provide step by step tutorials to help you get a jump-start on how to use HDP to implement a Modern Data Architecture at your organization.
These tutorials will walk you through the latest in data governance, improved data access, security, and streaming data. Here are a few to get you started:
Enterprise Hive and Pig with even better compatibility, scalability, and performance
HDP 2.2 delivers phase 1 of the Stinger.next initiative, a broad, open, community-based effort to improve speed, scale, and SQL semantics. Microsoft is a key contributor in the Stinger.next initiative.
In this release, Hive supports ACID transactions to provide atomicity, consistency, isolation, and durability. This helps with streaming and baseline update scenarios for Hive such as modifying dimension tables or other fact tables.The cost-based optimizer for Hive uses statistics to generate several execution plans and then chooses the most efficient path as it relates system resources required to complete the operation. This presents a major performance increase for Hive.
Automated cloud backup for Microsoft Azure with Apache Falcon
Data architects require Hadoop to act like other systems in the data center and business continuity through replication across on-premises and cloud-based storages targets is a critical requirement. In HDP 2.2, we extend the capabilities of Apache Falcon to establish an automated policy for cloud backup to Microsoft Azure. This is the first step in a broader vision to enable extensive heterogeneous deployment models for Hadoop spanning cloud-based and on-premises. Try out the tutorial: Incremental Backup of Data from HDP to Azure using Falcon for Disaster Recovery and Burst capacity.
Extensive improvements to manage and monitor Hadoop
Management and monitoring a cluster continues to be high priority for organizations adopting Hadoop. We have dramatically improved Ambari to be the single pane of glass to deploy, manage, and monitor a modern enterprise data infrastructure. Our completely open approach via Apache Ambari is unique, and we are excited to have Pivotal and HP jump on board to support Ambari with some of the other leaders in the data center like Microsoft and Teradata. Try Ambari in action with Deploying, managing and configuring HDP with Ambari 1.7.
Kafka and Storm for processing the Internet of Things
Included in HDP 2.2, Apache Kafka has quickly become the standard high-scale, fault-tolerant, publish-subscribe messaging system for Hadoop. It is often used with Storm and Spark so that you can stream events in to Hadoop in real time, and its application within the internet of things uses cases is tremendous. Try the tutorials below to see what's possible:
- Real-time event streams with Apache Kafka
- Real-time Data Ingestion in HBase & Hive using Storm Bolt
- Processing streaming data in Hadoop with Apache Storm
Comprehensive end to end security for Enterprise Hadoop
Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides central security policy administration across the core enterprise security requirements of authorization, accounting, and data protection. See it in action with Manage Security Policy for Hive & Hbase with Knox & Ranger.
This is just the tip of the iceberg in terms of what you can do with HDP 2.2. For more tutorials and to dive deeper into some of these new capabilities, dig in:
- Announcing Apache Hadoop 2.6.0
- Announcing Apache Hive 0.14
- Announcing Apache Pig 0.14.0
- Tutorials for Hortonworks Sandbox
- Hadoop for Hybrid Cloud Whitepaper
- Learn more about HDP on Azure