Blog

Expanding Our Data Discovery Leadership With Machine Learning Classification Tools

Identification capabilities bring speed and scale to applying governance policies

Sam Curcuruto
Sr. Product Marketing Manager, Data Discovery, OneTrust
May 4, 2023

Eliminate manual effort
Automatically apply data policies
Applications of ML models
Privacy by design

Sensitive data lives everywhere in the organization, including databases, systems, documents, and apps. However, not all data stores are the same, creating classification challenges for some automated solutions. OneTrust Data Discovery uses advanced machine learning (ML) and artificial intelligence (AI) to identify documents that cannot be classified using traditional pattern matching approaches. By determining a document based on its content and context, organizations can then automatically apply the right governance policies to ensure data is used responsibly.

Eliminate Manual Effort and Classify Data Using Content and Context

OneTrust Data Discovery goes beyond traditional pattern matching to intelligently scan and identify a document, such as a resume, passport, financial statement, or medical record. Machine learning helps saves time by classifying data at scale to minimize manual intervention and increase accuracy.

Automatically Apply Retention, Deletion, and Data Protection Policies

Once data is classified, security teams can ensure data is protected and handled based on its classification according to regulatory requirements. Using our improved classification and document identification, we can apply policies at the data level, such as ‘files containing PII’ and document level, like ‘resumés’ or ‘financial reports.’

Using these improved classifications enables the application and enforcement of policies like retention, deletion, or quarantine. We can also apply access policies to different data or document types, like ensuring that sensitive files or data are not shared with open access.

Applications of ML Models

OneTrust Data Discovery employs a number of intelligent technologies and new techniques to help our customers better discover, control, and activate their data at scale.

We use AI, natural language processing (NLP), and ML technology to automate document classification and categorize documents based on content, because industries like legal, healthcare, and finance have large volumes of documents to process. The algorithms learn from labeled data sets to recognize patterns and characteristics in text to classify documents accurately and efficiently.

A classic area where a lot of solutions struggle is with named entities. Think about the word “Savannah,” where it could be a person’s name or the city in the U.S. state of Georgia. To help classify data appropriately, we have tuned Spacy's Named Entity Recognition (NER) model, which is a machine learning algorithm to identify and extract named entities (people, organizations, locations) from unstructured text data. It can identify named entities in different languages, making it valuable for global customers.

We have also developed new ways to utilize OCR (Optical Character Recognition) machine learning models to extract characters from images, including printed or handwritten text, to convert to machine-readable. Thanks to the speed of our scanning technology, classification of PDFs and JPGs can be completed at scale.

Privacy by Design Is Built-in to Our AI and ML Strategy

OneTrust has been utilizing machine learning and AI for more than a year and it has been trained and used by privacy professionals. Our strategy has always been to use these and new technologies to better uncover, classify, protect, and encourage the responsible use of data across all enterprises.

We have built and deployed our technology with privacy by design in a way that each customer’s model is their own, tailored and trained by their own unique data and environment. Those models are never shared with anyone else.

Let us show you how it works — request a demo today.

Blog

Expanding Our Data Discovery Leadership With Machine Learning Classification Tools

Table of contents

Eliminate Manual Effort and Classify Data Using Content and Context

Automatically Apply Retention, Deletion, and Data Protection Policies

Applications of ML Models

Privacy by Design Is Built-in to Our AI and ML Strategy

You May Also Like

Data Discovery & Classification

Unlocking trusted data use with OneTrust + Databricks Unity Catalog

This webinar explores how OneTrust and Databricks integrate to deliver federated data governance at scale. Learn how automated data discovery and classification from OneTrust organizes data within Databricks’ Unity Catalog.

August 06, 2025

AI Governance

Automating metadata capture: Future-proofing data management for AI

This webinar will explore how automating metadata capture can streamline the management of unstructured data, making it AI-ready while ensuring data quality and security.

January 14, 2025

AI Governance

Navigating the top 5 data sharing challenges

This webinar will uncover the top 5 data sharing challenges organizations face and demonstrate how advanced data governance solutions can streamline processes, improve data quality, and enhance compliance, allowing organizations to discover the full potential of their data assets.

October 31, 2024

Data Discovery & Classification

Enhancing Data Governance: OneTrust and Snowflake strategies for data-driven businesses

Join us for a webinar with Jim Warner and Alex Cash to explore how Snowflake and OneTrust can revolutionize your data governance strategy, helping you maintain data quality, ensure compliance, and exceed marketing ROI in 2024.

September 24, 2024

AI Governance

Data and AI governance for responsible use of data

Learn why discovering, classifying, and using data responsibly is the only way to ensure your AI is governed properly.

September 12, 2024

Data Discovery & Classification

Catch it live: See the all-new features in OneTrust's Spring Release and Post-TrustWeek recap

June 06, 2024

Privacy & Data Governance

Data governance across industries: Leveraging your organization's most valuable asset

Download our new eBook and learn how to leverage the value of data governance across industries, including financial services, healthcare, retail, and manufacturing.

April 17, 2024

Data Discovery & Classification

The KuppingerCole Leadership Compass on Data Governance

OneTrust has been named a leader in the 2024 KuppingerCole Leadership Compass on Data Governance, receiving the highest rating for Product​, Innovation​, and Market.

March 08, 2024

Data Discovery & Classification

OneTrust Privacy & Data Governance Cloud gains momentum with widespread industry recognition

OneTrust maintains its leading position in Privacy & Data Governance, with a record number of recognitions in the last six months from KuppingerCole and Forrester

March 07, 2024

Data Discovery & Classification

Data governance in manufacturing: Challenges and use cases

Learn the impact a data governance program has in manufacturing and how it enables greater efficiency across your supply chain

February 26, 2024

Data Discovery & Classification

What to look for in a data discovery solution

Make sure you choose the right data discovery solution for your organization with our comprehensive breakdown of key benefits and features to look for.

February 20, 2024

Data Discovery & Classification

Data governance in retail: Challenges and use cases

Learn how data governance can help manage the high volume and sensitivity of data that runs through your retail operations.

February 12, 2024

Data Discovery & Classification

Data governance in healthcare: Challenges and use cases

Learn how data governance can help your healthcare organization effectively manage its protected health information (PHI) and other sensitive data.

February 08, 2024

Data Discovery & Classification

Data governance in financial services: Challenges and use cases

Learn how data governance can help address common challenges in the financial services industry and protect your most critical information.

January 12, 2024

Data Discovery & Security

A guided tour of OneTrust Data Discovery magic

Our expert speaker will demonstrate how common real-world data challenges can be identified, addressed, and reported on, leading to better data governance, security, and alignment with business goals.

October 26, 2023

Data Discovery & Security

Data minimization and risk assessment in data discovery

Explore the concept of data minimization and its crucial role in enhancing security, privacy, and reducing risk.

October 19, 2023

Data Discovery & Security

Data Discovery Dispelled: Data's dark corners

Join the first part of our Data Discovery Dispelled webinar series where we will discuss the hidden sensitive information that could pose risks for your organization.

October 12, 2023

Data Discovery & Security

Data Discovery Dispelled: Unmasking the mysteries of data

Join us for a journey into the heart of data management as we explore the depths of data within organizations and shed light on how technology can enhance data security, privacy, and compliance.

October 12, 2023

Privacy & Data Governance

OneTrust has been named a leader in the 2024 KuppingerCole Leadership Compass on Data Governance, receiving the highest rating for Product, Innovation, and Market.

Join us for this instalment of our Future of Privacy Automation Series for a discussion of the challenges, key components, and building blocks of DSAR automation.