Guest Post: Jump-Start Big Data with Hortonworks Sandbox on Azure

Sign in to queue


The following is a guest post by Saptak Sen, Senior Product Manager at Hortonworks. Prior to this, Sen was the Senior Product Manager for High Performance Computing and Technical Computing at Microsoft.

We're excited to announce the general availability of Hortonworks Sandbox for Hortonworks Data Platform 2.2 on Azure.

Hortonworks Sandbox is already a very popular environment in which developers, data scientists, and administrators can learn and experiment with the latest innovations in the Hortonworks Data Platform.

The hundreds of innovations span Hadoop, Kafka, Storm, Hive, Pig, YARN, Ambari, Falcon, Ranger, and other components of which HDP is composed. Now you can deploy this environment for your learning and experimentation in a few clicks on Microsoft Azure.

Follow the guide to Getting Started with Hortonworks Sandbox with HDP 2.2 on Azure to set up your own dev-ops environment on the cloud in a few clicks.

We also provide step by step tutorials to help you get a jump-start on how to use HDP to implement a Modern Data Architecture at your organization.

Generic Episode Image

These tutorials will walk you through the latest in data governance, improved data access, security, and streaming data. Here are a few to get you started:

Enterprise Hive and Pig with even better compatibility, scalability, and performance

HDP 2.2 delivers phase 1 of the initiative, a broad, open, community-based effort to improve speed, scale, and SQL semantics. Microsoft is a key contributor in the initiative.

In this release, Hive supports ACID transactions to provide atomicity, consistency, isolation, and durability. This helps with streaming and baseline update scenarios for Hive such as modifying dimension tables or other fact tables.The cost-based optimizer for Hive uses statistics to generate several execution plans and then chooses the most efficient path as it relates system resources required to complete the operation. This presents a major performance increase for Hive.

Like Hive, now Pig can also take advantage of the fantastically fast Tez engine. Check these out with Faster Pig with Tez and Interactive Query with Hive and Tez.

Automated cloud backup for Microsoft Azure with Apache Falcon

Data architects require Hadoop to act like other systems in the data center and business continuity through replication across on-premises and cloud-based storages targets is a critical requirement. In HDP 2.2, we extend the capabilities of Apache Falcon to establish an automated policy for cloud backup to Microsoft Azure. This is the first step in a broader vision to enable extensive heterogeneous deployment models for Hadoop spanning cloud-based and on-premises. Try out the tutorial: Incremental Backup of Data from HDP to Azure using Falcon for Disaster Recovery and Burst capacity.

Extensive improvements to manage and monitor Hadoop

Management and monitoring a cluster continues to be high priority for organizations adopting Hadoop. We have dramatically improved Ambari to be the single pane of glass to deploy, manage, and monitor a modern enterprise data infrastructure. Our completely open approach via Apache Ambari is unique, and we are excited to have Pivotal and HP jump on board to support Ambari with some of the other leaders in the data center like Microsoft and Teradata. Try Ambari in action with Deploying, managing and configuring HDP with Ambari 1.7.

Kafka and Storm for processing the Internet of Things

Included in HDP 2.2, Apache Kafka has quickly become the standard high-scale, fault-tolerant, publish-subscribe messaging system for Hadoop. It is often used with Storm and Spark so that you can stream events in to Hadoop in real time, and its application within the internet of things uses cases is tremendous. Try the tutorials below to see what's possible:

Comprehensive end to end security for Enterprise Hadoop

Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides central security policy administration across the core enterprise security requirements of authorization, accounting, and data protection. See it in action with Manage Security Policy for Hive & Hbase with Knox & Ranger.

This is just the tip of the iceberg in terms of what you can do with HDP 2.2. For more tutorials and to dive deeper into some of these new capabilities, dig in:

The Discussion

Add Your 2 Cents