Securing sensitive data is just the beginning. The real value lies in securely storing and utilizing that data with a privacy-first approach —all within a simpler, more cost-efficient infrastructure. We’re thrilled to introduce Bluemetrix SecureToken, a multi-layered protection solution designed to maximise the value of your sensitive data while ensuring robust security within the Cloudera environment. This innovative feature is now available for trial to all existing Cloudera customers.
The journey to fully harnessing artificial intelligence (AI) begins with data. As AI continues to revolutionise industries, the demand for vast amounts of data grows.
From large language models (LLMs) to predictive analytics, cloud scalability provides the infrastructure required to process, analyse, and store the immense amounts of data necessary to fuel modern AI innovations.
However, large regulated, data-centric industries still face challenges in protecting PII in the cloud, de-identifying it for ML/analytics, and doing so at scale – ensuring only authorised applications, systems, or personnel can access this secure data without requiring every data engineer to become a security expert.
Bluemetrix SecureToken – Built-In Vaultless Tokenization Solution for CDP
Clearly, a modern tokenization solution is required to address these needs – one that overcomes the technical constraints that could potentially arise. This is why we are introducing Bluemetrix SecureToken – a vaultless tokenization solution native to Cloudera Data Platform (CDP) where the data never has to leave the CDP environment for tokenization.
Bluemetrix SecureToken is a Spark-based, NIST-compliant data protection technology designed to secure sensitive data at scale. Unlike traditional tokenization methods, which can be computationally intensive and impose platform trade-offs (as tokenization usually needs to be performed in a different environment to the one the data is stored), this native vaultless tokenization solution offers dual-layer protection, allowing for seamless tokenization and detokenization within the Cloudera Lakehouse environment.
Bluemetrix SecureToken delivers three major security benefits:
Secure In Place: Tokenization is natively integrated into the existing Cloudera Data Platform and ETL/ELT tools via Java UDFs. This approach eliminates the need for the risky transfer of data, reduces operational bottlenecks and costly third-party solutions while keeping the underlying data secure in its data repository.
Secure at Scale: As your data scales, so does your tokenization power. Built upon the CDP Spark cluster, it scales effortlessly allowing you to process large amounts of sensitive data as your requirements grow. Sensitive data remains protected at rest, in transit, and during processing, with effective resource allocation control.
Secure with Ease: By preserving the type, format and structure of the original data, Bluemetrix SecureToken allows all your sensitive data types to be tokenized while ensuring the tokenized data will work with your existing analytics programs and AI/GenAI data models. Combined with Cloudera’s Ranger and KMS, which facilitate the creation and customization of governance policies through Atlas, the solution simplifies data security and governance.
The primary features of SecureToken include the following:
Built-in tokenization methods (Java UDFs) that are native to Spark and CDP
Compatibility with all ETL/ELT tools for easy integration into existing data pipelines
Scalable architecture capable of handling any datasets, regardless of size
Compliance with NIST (FF1, FIPS 140-3 compatible) standards, meeting industry-recognised benchmarks
Centralised key management and authorisation with Cloudera Ranger/KMS for enhanced control
A diverse range of pre-built routines available that simplify use case implementation
Ability to tokenize Structured and Semi-Structured data
Simplify Data Migration to the Cloud with SecureToken – Effortlessly Tokenize Data in Both On-Premises and Cloud Environments
As many companies embark on their journey to a hybrid cloud implementation, securing sensitive data before transitioning to the cloud is often the limiting factor in the migration process.
SecureToken solves this problem by enabling you tokenize across environments using the same key – this, in turn, means that users in each environment are shown different views of the data, with privileged users being able to view the original data, while unprivileged users can only view the tokenized (secured) data.
Bluemetrix SecureToken provides flexibility by offering tokenization simultaneously in three distinct locations:
on-premises
in-memory during the data transfer to the cloud
directly in the cloud
This ensures that sensitive data is protected throughout the entire migration process and that all data stored in the cloud is secure.
Additionally, our integration with Ranger KMS allows authorized users to detokenize the data within the cloud environment using the same key from the on-premises setup, which means the SecureToken solutions protect your data both on-prem and in the Cloud at the same time.
This seamless approach ensures robust data security and operational continuity throughout your hybrid cloud journey.
How Bluemetrix SecureToken Works for Cloudera Data Platform (CDP)
Here’s an example of a Bluemetrix SecureToken that tokenizes personally identifiable information (PII) and other data types while preserving analytic usability.
Bluemetrix SecureToken provides flexibility with built-in routines for different data types, which can allow for partial preservation of data if required. Creating and specifying custom routines is also possible to satisfy any data-specific requirements. In the above example, the country code prefix for phone numbers is preserved, as well as some IBAN information including the country code.
Using Ranger masking policies enforced via Atlas tags, users may see the original data or the tokenized data, depending on their permissions. These Ranger policies allow complete control over detokenized data since the default behaviour is no access.
The above shows how columns can dynamically be detokenized for users at query time, depending on their permissions. Combining this functionality with built-in Ranger masking rules can allow further fine-grained access control, such as providing additional masking of even the tokenized data stored on disk.
Future-Proof Your AI and Cloud Journey with Secure Data Protection
With the continued rapid deployment of GenAI projects amongst major Enterprises and the accelerated migration of data to the Cloud, the pressure on all organizations to secure and protect PII and Sensitive data has never been greater, whether that is to meet internal compliance requirements or external Regulatory Environments.
There has never been a better time for enterprises to start adopting tokenization for the first time, whether that is to streamline data management processes or to prepare data for AI and generative AI projects. It is now possible to satisfy all your tokenization needs on your Cloudera environment using Bluemetrix SecureToken, ensuring that your PII and Sensitive data never leave CDP in an un-secure form while leveraging the market-leading security tools that CDP provides.
Free 60-day Trial of Bluemetrix SecureToken
You can now access a free 60-day trial of SecureToken by downloading the Accelerators for ML Project (AMP) from Cloudera Marketplace or by registering with us to begin your trial here.
Leonardo Dias
Head of Professional Services, Bluemetrix
Manick Mehra
Solution Engineer, Cloudera
Comments