Skip to main content

On-demand webinar coming soon...

Blog

Expanding our data discovery leadership with machine learning classification tools

Identification capabilities bring speed and scale to applying governance policies

Sam Curcuruto
Sr. Product Marketing Manager, Data Discovery, OneTrust
May 4, 2023

Abstract curved office building facade

Sensitive data lives everywhere in the organization, including databases, systems, documents, and apps. However, not all data stores are the same, creating classification challenges for some automated solutions. OneTrust Data Discovery uses advanced machine learning (ML) and artificial intelligence (AI) to identify documents that cannot be classified using traditional pattern matching approaches. By determining a document based on its content and context, organizations can then automatically apply the right governance policies to ensure data is used responsibly.

Eliminate manual effort and classify data using content and context

OneTrust Data Discovery goes beyond traditional pattern matching to intelligently scan and identify a document, such as a resume, passport, financial statement, or medical record. Machine learning helps saves time by classifying data at scale to minimize manual intervention and increase accuracy.

Automatically apply retention, deletion, and data protection policies

Once data is classified, security teams can ensure data is protected and handled based on its classification according to regulatory requirements. Using our improved classification and document identification, we can apply policies at the data level, such as ‘files containing PII’ and document level, like ‘resumés’ or ‘financial reports.’

Using these improved classifications enables the application and enforcement of policies like retention, deletion, or quarantine. We can also apply access policies to different data or document types, like ensuring that sensitive files or data are not shared with open access.

Applications of ML models

OneTrust Data Discovery employs a number of intelligent technologies and new techniques to help our customers better discover, control, and activate their data at scale.

We use AI, natural language processing (NLP), and ML technology to automate document classification and categorize documents based on content, because industries like legal, healthcare, and finance have large volumes of documents to process. The algorithms learn from labeled data sets to recognize patterns and characteristics in text to classify documents accurately and efficiently.

A classic area where a lot of solutions struggle is with named entities. Think about the word “Savannah,” where it could be a person’s name or the city in the U.S. state of Georgia. To help classify data appropriately, we have tuned Spacy's Named Entity Recognition (NER) model, which is a machine learning algorithm to identify and extract named entities (people, organizations, locations) from unstructured text data. It can identify named entities in different languages, making it valuable for global customers.

We have also developed new ways to utilize OCR (Optical Character Recognition) machine learning models to extract characters from images, including printed or handwritten text, to convert to machine-readable. Thanks to the speed of our scanning technology, classification of PDFs and JPGs can be completed at scale.

Privacy by design is built-in to our AI and ML strategy

OneTrust has been utilizing machine learning and AI for more than a year and it has been trained and used by privacy professionals. Our strategy has always been to use these and new technologies to better uncover, classify, protect, and encourage the responsible use of data across all enterprises.

We have built and deployed our technology with privacy by design in a way that each customer’s model is their own, tailored and trained by their own unique data and environment. Those models are never shared with anyone else.

Let us show you how it works — request a demo today. 


You may also like

Webinar

AI Governance

Automating metadata capture: Future-proofing data management for AI

This webinar will explore how automating metadata capture can streamline the management of unstructured data, making it AI-ready while ensuring data quality and security.

January 14, 2025

Learn more

Webinar

Navigating the top 5 data sharing challenges

This webinar will uncover the top 5 data sharing challenges organizations face and demonstrate how advanced data governance solutions can streamline processes, improve data quality, and enhance compliance, allowing organizations to discover the full potential of their data assets.

October 31, 2024

Learn more

Webinar

Data Discovery & Classification

Enhancing Data Governance: OneTrust and Snowflake strategies for data-driven businesses

Join us for a webinar with Jim Warner and Alex Cash to explore how Snowflake and OneTrust can revolutionize your data governance strategy, helping you maintain data quality, ensure compliance, and exceed marketing ROI in 2024.

September 24, 2024

Learn more

eBook

AI Governance

Data and AI governance for responsible use of data

Learn why discovering, classifying, and using data responsibly is the only way to ensure your AI is governed properly.

September 12, 2024

Learn more

Webinar

Data Discovery & Classification

Catch it live: See the all-new features in OneTrust's Spring Release and Post-TrustWeek recap

Join us as Ryan Karlin, Senior Director of Product Marketing highlights important updates from TrustWeek including an inside look into OneTrust's new platform features that make it easier for customers to activate data responsibly, surface and mitigate risk, and navigate the complex regulatory environment.

June 06, 2024

Learn more

eBook

Privacy & Data Governance

Data governance across industries: Leveraging your organization's most valuable asset

Download our new eBook and learn how to leverage the value of data governance across industries, including financial services, healthcare, retail, and manufacturing.

April 17, 2024

Learn more

Report

Data Discovery & Classification

The KuppingerCole Leadership Compass on Data Governance

OneTrust has been named a leader in the 2024 KuppingerCole Leadership Compass on Data Governance, receiving the highest rating for Product​, Innovation​, and Market.

March 08, 2024

Learn more

Infographic

Data Discovery & Classification

OneTrust Privacy & Data Governance Cloud gains momentum with widespread industry recognition

OneTrust maintains its leading position in Privacy & Data Governance, with a record number of recognitions in the last six months from KuppingerCole and Forrester

March 07, 2024

Learn more

Infographic

Data Discovery & Classification

Data governance in manufacturing: Challenges and use cases

Learn the impact a data governance program has in manufacturing and how it enables greater efficiency across your supply chain

February 26, 2024

Learn more

Infographic

Data Discovery & Classification

What to look for in a data discovery solution

Make sure you choose the right data discovery solution for your organization with our comprehensive breakdown of key benefits and features to look for.

February 20, 2024

Learn more

Infographic

Data Discovery & Classification

Data governance in retail: Challenges and use cases

Learn how data governance can help manage the high volume and sensitivity of data that runs through your retail operations.

February 12, 2024

Learn more

Infographic

Data Discovery & Classification

Data governance in healthcare: Challenges and use cases

Learn how data governance can help your healthcare organization effectively manage its protected health information (PHI) and other sensitive data.

February 08, 2024

Learn more

Infographic

Data Discovery & Classification

Data governance in financial services: Challenges and use cases

Learn how data governance can help address common challenges in the financial services industry and protect your most critical information.

January 12, 2024

Learn more

Webinar

Data Discovery & Security

A guided tour of OneTrust Data Discovery magic

Our expert speaker will demonstrate how common real-world data challenges can be identified, addressed, and reported on, leading to better data governance, security, and alignment with business goals. 

October 26, 2023

Learn more

Webinar

Data Discovery & Security

Data minimization and risk assessment in data discovery

Explore the concept of data minimization and its crucial role in enhancing security, privacy, and reducing risk.

October 19, 2023

Learn more

Webinar

Data Discovery & Security

Data Discovery Dispelled: Unmasking the mysteries of data

Join us for a journey into the heart of data management as we explore the depths of data within organizations and shed light on how technology can enhance data security, privacy, and compliance.

October 12, 2023

Learn more

Webinar

Data Discovery & Security

Data Discovery Dispelled: Data's dark corners

Join the first part of our Data Discovery Dispelled webinar series where we will discuss the hidden sensitive information that could pose risks for your organization.

October 12, 2023

Learn more

Infographic

Privacy & Data Governance

Understanding the EU Data Boundary

Download our free infographic and get the information you need to understand the EU Data Boundary and how to properly handle data in the European Union.

September 22, 2023

Learn more

eBook

Data Discovery & Classification

Ultimate guide to building a data governance program

Download this eBook and learn practical methods in building a flexible data governance program that aligns with your business.

August 14, 2023

Learn more

Webinar

Data Discovery & Classification

Live demo: OneTrust Data Discovery

See how OneTrust Data Discovery can help your organization achieve complete data visibility to empower your security program and reduce risk.

June 23, 2023

Learn more

Webinar

Data Discovery & Classification

OneTrust Data Discovery Day: A deep dive into automating data discovery and classification

Join us for a two-hour deep dive into data discovery and how OneTrust helps privacy, IT, and security teams understaind their data and achieve risk reduction goals.

June 13, 2023

Learn more

Infographic

Data Discovery & Classification

How OneTrust Data Discovery integrates with Microsoft 365

Explore three key integration capabilities of OneTrust Data Discovery and Microsoft 365.

June 13, 2023 3 min read

Learn more

Webinar

Data Discovery & Classification

Monitoring least privilege access risks

Understand common scenarios for applying data access governance within your business and key considerations for evaluating open access risk.

May 18, 2023

Learn more

In-Person Event

Privacy & Data Governance

Privacy in practice

Join us for a deep dive into embedding privacy by design into the fabric of your business to promote the responsible use of data.

May 09, 2023

Learn more

Webinar

Data Discovery & Classification

Orchestrating data retention & deletion to reduce ROT data

Learn how organizations who orchestrate data retention not only satisfy retention requirements, but also reduce data sprawl and breach risk. 

April 27, 2023

Learn more

Webinar

Data Discovery & Classification

De-Risking data with visibility & classification

Join this interactive webinar to learn how Data Discovery helps information security teams gain visibility into risky data and prioritize investments.

April 11, 2023

Learn more

Infographic

De-risking data through visibility and action

The rapid growth of data has increased the risk of data breaches, learn how IT and security teams can secure, monitor, and de-risk that digital information.

March 09, 2023

Learn more

Infographic

Data Discovery & Classification

The CISO challenge: Data. Threats. Regulations.

Unstructured data poses risks due to its open access and lack of governance, and CISOs need to implement measures to track, de-risk, and protect it.

March 03, 2023

Learn more

Webinar

Data Discovery & Classification

Mitigating US privacy risk to control your organization’s attack surface

In this session, we'll discuss how the requirements under upcoming US Privacy laws create an opportunity for businesses to embed privacy by default.

November 17, 2022

Learn more

Webinar

Data Discovery & Classification

UK panel: What are data subject access requests and how do you manage them?

Join our UK legal experts as they discuss data subject rights access requests (DSAR) and how automation streamlines fulfilment and protects privacy.

April 19, 2022

Learn more

Webinar

Privacy Management

Privacy rights: Enhance Your DSAR process with automation, discovery & redaction

As part of our Privacy Automation webinar series, we discuss why it's important to automate DSAR fulfillment and the latest regulatory trends. 

March 22, 2022

Learn more

Webinar

Data Discovery & Classification

UK DSAR automation: From intake to redaction and beyond

Join us for this instalment of our Future of Privacy Automation Series for a discussion of the challenges, key components, and building blocks of DSAR automation.

March 14, 2022

Learn more

Webinar

Data Discovery & Classification

Meeting California's employee privacy rights requirements

Watch this webinar to learn more about California's employee privacy rights requirements and how to comply.

March 08, 2022

Learn more

Webinar

Data Discovery & Classification

Tackling unstructured data challenges

In this webinar, learn about the risks of unstructured data and effective strategies in automating discovery.

March 02, 2022

Learn more

Webinar

Data Discovery & Classification

Snowflake and OneTrust: Integrated data governance for your enterprise data

Watch this webinar where we discuss how Snowflake leveraged OneTrust to help better understand and classify their data.

October 05, 2021

Learn more