What is data discovery?

From data access to data sprawl, businesses are dealing with unprecedented amounts of digital information that needs to be monitored, managed, and secured. Here's your guide.

Jason Koestenblatt, Team Lead, Content Marketing, OneTrust
March 10, 2023


It’s everywhere. It’s everything.

And your business is creating, capturing, processing, and controlling tons of it.

Some 2.5 quintillion bytes of data are being created each day, according to a report by G2 in early 2023. Think of a number with 18 zeroes following it! That report also showed that an internet user – that’s you – generates around 1.7 megabytes of data per second.

Chart of data points outlining key information on data bytes, breaches, and data privacy laws

The data point that may be most telling, however, is the estimation that humanity will generate three times more data in 2023 than it did in 2019. Data creation is growing faster than you can say zettabyte (as in 97 zettabytes, the estimated volume of data created in 2022).

That’s exciting, as the digital world now sees data as currency. But with great power comes great responsibility, and every data point in your company’s ecosystem can become a security and legal liability.

What does it mean to de-risk data, and how do we gain visibility and classification? Join this webinar to learn more.

Enter data discovery and the act of de-risking said data, and all the governance needed in your organization to properly harness (and safeguard) that information. Each company and, at a deeper look, each security team, is going to have pain points when it comes to their data classification structure. However, knowing what data you have, where it is, what it is – and isn’t – used for the business can be a great head start when defining next steps in discovering, controlling, and activating that data.


On-demand webinar coming soon...


More data, more problems

Acknowledging the problems that massive amount of data poses to your organization is going to be step one in proper classification.

Problem 1: Lack of visibility into a growing dataset

Your organization collects and generates a massive amount of data across different systems in a variety of forms. Before you can establish and enforce policies to promote usability, secure data, and maintain compliance, you must understand what data you have and WHY you have it.

Problem 2: Need to reconcile data risk and reward

Because of this relationship, you’re always on the hunt for technology that helps your business understand the data it has, the risks it poses to the business, external requirements (compliance) related to data, as well as the internal initiatives and expectations related to it.

Problem 3: Time to market

Your business needs to be able to find sensitive data, highlight where it lies, and be able to quickly take remediation efforts in the event of a security incident.

The average volume of data held by an enterprise grew by 42% last year. One of the biggest challenges stemming from this explosion of data is insider access. Does your company know how to monitor and manage this type of data sprawl? Join this webinar to learn more.

What is data access governance?

The key objective of data access governance is to gain visibility into risk and enforce data access policies. Data access management has evolved into an independent initiative that requires an autonomous strategy, budget, and implementation schedule. Data access governance covers many crucial areas, including data security; protecting PII; providing access to critical data assets; and managing permissions.

What is dark data?

Dark data is the information assets an organization collects, processes, and stores during regular business activities, but generally fails to use for other purposes. For example, dark data could come in the form of analytics, business relationships, and direct monetization.

Who is a data citizen?

A data citizen is an employee who is given access to an organization’s proprietary information. Use of the word “citizen” is meant to emphasize the idea that an employee’s right to access corporate data also comes with responsibilities.

What is a data estate?

A data estate is simply the infrastructure to help companies systemically manage all their owned corporate data.

What is data minimization?

Data minimization is a principle that states data collected and processed should not be held or further used unless this is essential for reasons that were clearly stated in advance to support data privacy.

What is Data Security Posture Management?

Data Security Posture Management (DSPM) is an emerging market focused on reducing risk and improving the security around an organization’s most valuable asset – its data.

What is Data Sprawl?

Data sprawl is the proliferation in the number and different kinds of digital information (data) created, collected, stored, shared, and analyzed by businesses, primarily at the enterprise level. On average, organizations have four-to-six platforms to manage data.

What is ROT Data?

Redundant, obsolete, or trivial (ROT) data is the digital information a business has despite the data having no business or legal value, i.e. a duplicated piece of information or data point that doesn’t help the company in any positive way.

In order to cull and manage ROT data, your business needs a data retention and deletion strategy. Join this webinar for tips and best practices on ensuring ROT data isn’t hindering your business.

Shift left: A data classification strategy

Data discovery has as much to do with classifying its whereabouts and importance as it does what actions should ultimately be taken with that digital information. Forward-looking security should be employing the shift left strategy. But what exactly does that mean?

Shift left is a philosophy that looks at data ingestion at the left side of a horizontal funnel (see image). According to IAPP, that narrow end represents the point when data first enters the company’s tech ecosystem. As you move right in the funnel, the amount of data grows with copies, inferences, and data analysis. The point of collection is best suited to classify and inventory data, creating downstream efficiencies. Most companies classify and inventory data toward the right side of the funnel, which is a recipe for delays, inaccuracies, and potential security incidents.

Chart demonstrating the concept of employing a shift left policy for data privacy and security

For security teams to be able to shift left in their data classification strategy, they’ll need a consumer-facing collection point for capturing consent and purpose that integrates these signals into the data map to inform the orchestration of data policies that include access and retention.

What is the responsible use of data?

Now that problems have been acknowledged and definitions for data discovery explained, how does a company responsibly use the data it captures and creates? What exactly is responsible data use?

Much like your business considers and applies guidelines and frameworks around its people, products, and processes, so should it be doing for its data. Organizations need to think of the data it has as part of the people it is tied to. The data must be treated ethically and fairly, just the way people are. 

With data creating infinite risk factors to organizations, CISOs are facing unheralded security incidents. Check out this infographic to better understand mitigation strategies. 

Consider a three-step approach to the data management lifecycle your business employs:

Discover: Uncover hidden data including good data in bad places, sensitive data with inappropriate access, and hoarded dated

Control: Trigger internal workflows to remove sensitive information, restrict access, or apply privacy-enhancing technology such as encryption or masking

Activate: Promote responsible data usage by automating core privacy workflows, and capturing and governing throughout the data lifecycle 

Businesses must consider their needs and goals when using data, no matter which department is processing or controlling that information and regardless of structure. There are six guiding principles to responsible data use that can help organizations.

What’s the purpose?

Data collection should be tied to a purpose, its use limited to that purpose, and disposed of when no longer needed to fulfill that purpose. For personal data, specifically, the purpose should be clearly communicated to the individual at the point of collection.

Be transparent

Organizations should clearly communicate how and why data is collected, used, and shared.

Offer the choice

Individuals should be given the ability to granularly choose or consent to how their data is being used, creating a mutual value exchange that builds trust

Implement governance

Organizations must have the proper technical controls in place to ensure that data is only used as defined by their policies and the informed consent of the individual

Protection through security

Organizations must have the proper security controls in place to ensure that data is protected from unauthorized use or disclosure

Ethical evaluations

Organizations should evaluate the ethical implications of data use as well as the legal implications, especially with emerging technologies such as artificial intelligence

Gain visibility and take action to de-risk your organization’s staggering amount of data. Learn how to implement those strategies in this infographic.

You may also like


Data Discovery & Security

A guided tour of OneTrust Data Discovery magic

Our expert speaker will demonstrate how common real-world data challenges can be identified, addressed, and reported on, leading to better data governance, security, and alignment with business goals. 

October 26, 2023

Learn more


Data Discovery & Security

Data minimization and risk assessment in data discovery

Explore the concept of data minimization and its crucial role in enhancing security, privacy, and reducing risk.

October 19, 2023

Learn more


Data Discovery & Security

Data Discovery Dispelled: Data's dark corners

Join the first part of our Data Discovery Dispelled webinar series where we will discuss the hidden sensitive information that could pose risks for your organization.

October 12, 2023

Learn more