The following is a guest post by Pratik Verma, Founder and Chief Product Officer at BlueTalon, which keeps enterprises in control of their data.
Azure HDInsight (HDI) makes it easy to quickly and cost-effectively process vast amounts of data in the cloud. Often, the data used in these analyses, such as customer information, transaction history, and other proprietary data, are sensitive from a business perspective and may even be subject to regulatory compliance. BlueTalon keeps enterprises in control of their data by allowing them to give users access to the data they need, not a byte more.
In this blog, we share how organizations can enable more business users to benefit from data in HDInsight by mitigating risks associated with putting sensitive data in the cloud through precise authorization and dynamic masking provided by BlueTalon.
BlueTalon provides capabilities for data-centric security:
- Audits of user activity using a context-rich trail of queries users run that hit sensitive fields.
- Precise control over data that is specific for each user identity or business role and specific for the data resource at the file, folder, table, column, row, cell, or partial cell level.
- Secure use of business data in policy decisions for real-world requirements, while maintaining complex access scenarios and relationship between users and data.
Typical use of HDInsight with BlueTalon Data-Centric Security
In a typical use of Azure HDInsight, organizations move data from on-premises or other cloud environments to Azure cloud storage for use in HDInsight. Using modeling, wrangling and transformation tools available as part of HDInsight, the raw data goes through a curation process to make it usable for business users. These business users operate on top of curated data using ad hoc query tools like Excel and Tableau to draw insight from it. With BlueTalon, organizations can enable hundreds of business users to benefit from the data in HDInsight while ensuring these users get data only per their entitlement and ensuring that the data curation pipelines are easily managed.
There are two types of users of BlueTalon software:
- Security administrators, who are assigned the task of managing security for an HDI cluster, and
- Business users, who visualize or explore data in HDInsight to draw insights.
Security administrators use the BlueTalon Policy UI to create rules about what actions business users can take with the data in HDI. Business users use tools like Tableau, Excel or Azure tools to explore data in HDI using Hive without a change in their user experience and with the policy applied. Security administrators use the BlueTalon Audit UI to audit which users attempted to access which data when and what they were able to access.
BlueTalon provides security administrators the following levers of control for precisely defining entitlement of business users:
- Field protection: Fields can be denied without breaking the application. As an example, a blank value compatible with the id field is returned instead of revealing the id values as they are stored on disk.
- Record protection: The result set can be filtered to return a subset of the data, even when the field used in the filter criteria is not in the result set. In this example, the user is able to see only the two records with the East Coast zip codes, compared to 10 records on disk.
- Cell protection: A specific field value for a specific record can be protected. In this example, the user is able to see the birth date value for "Joyce McDonald" but not "Kelly Adams." Here as well, the date field is compatible with the format expected by the application.
- Partial cell protection: Even portions of a cell may be protected. In this example, the user is able to see the last four digits of a Social Security number, rather than the number being hidden entirely.
BlueTalon in the Azure Marketplace
BlueTalon Data-Centric Security software is available as an offering in the Azure Marketplace as a Virtual Machine or a Solution Template. For a single persistent HDI cluster, use the Solution Template. For enforcing consistent security across multiple persistent or transient HDI clusters and relational databases use the Virtual Machine. The Solution Template with the default parameters deploys BlueTalon on an application VM along with an HDI cluster in the configuration shown in the figure below.
To deploy the Virtual Machine from the Azure portal
To deploy the Solution Template from the Azure portal
How BlueTalon in the Azure Marketplace works
The BlueTalon data-centric security solution has three main components: a UI to create rules and visualize real-time audit, a Policy Engine to make fast run-time authorization decisions, and a collection of Enforcement Points that transparently enforce the decisions made by the Policy Engine. Optionally, if you have an external Active Directory (AD) or OpenLDAP that manages business user accounts, you can connect BlueTalon to that directory to authenticate business users. Full AD integration for all services inside HDInsight is coming soon!
For applications accessing data via Hive, the BlueTalon Hive enforcement point transparently proxies HiveServer2 at the network level and provides policy-protected data. The BlueTalon Policy Engine makes sophisticated, fine-grained policy decisions based on user and content criteria in-memory at run-time by re-engineering SQL requests for Hive. With the query modification technique, BlueTalon is able to ensure that end users get the same data, whether raw data is coming from local HDFS or WASB, and that only policy-compliant data is pulled from storage by Hive. With the enforcement point for HDFS, BlueTalon ensures that end users can't get around its security by going to HDFS to obtain data not accessible via Hive. Integration with Azure Data Lake Storage (ADLS) and Windows Azure Storage Blob (WASB) is coming soon!
Using BlueTalon with HDInsight
Once the BlueTalon software is deployed, security administrators can connect to the BlueTalon VM using the following ssh command to tunnel communication over port 8111 to the BlueTalon VM.
ssh -L 8111:<bluetalon vm private ip address>:8111 <username>@<bluetalon vm public ip address>
While the ssh connection is open, the security administrator can connect to the BlueTalon Policy UI from a Chrome browser at the following URL
The default username and password for logging into this UI are "btadminuser" and "P@ssw0rd" respectively.
Using the BlueTalon Policy UI, the security administrator can add resources, user identities and policies. Contact email@example.com if you are unable to access the BlueTalon Policy UI.
Here's an example of what a business user would see in Excel if they pulled data on SalesOrders from HDI with the BlueTalon policy applied. All records are allowed, but the SalesOrder. Location data is masked.
Contact firstname.lastname@example.org for a live demo or a BYOL license.