What is data discovery?

From data access to data sprawl, businesses are dealing with unprecedented amounts of digital information that needs to be monitored, managed, and secured. Here's your guide.

Jason Koestenblatt, Team Lead, Content Marketing, OneTrust
March 10, 2023


It’s everywhere. It’s everything.

And your business is creating, capturing, processing, and controlling tons of it.

Some 2.5 quintillion bytes of data are being created each day, according to a report by G2 in early 2023. Think of a number with 18 zeroes following it! That report also showed that an internet user – that’s you – generates around 1.7 megabytes of data per second.

Chart of data points outlining key information on data bytes, breaches, and data privacy laws

The data point that may be most telling, however, is the estimation that humanity will generate three times more data in 2023 than it did in 2019. Data creation is growing faster than you can say zettabyte (as in 97 zettabytes, the estimated volume of data created in 2022).


That’s exciting, as the digital world now sees data as currency. But with great power comes great responsibility, and every data point in your company’s ecosystem can become a security and legal liability.


What does it mean to de-risk data, and how do we gain visibility and classification? Join this webinar to learn more.


Enter data discovery and the act of de-risking said data, and all the governance needed in your organization to properly harness (and safeguard) that information. Each company and, at a deeper look, each security team, is going to have pain points when it comes to their data classification structure. However, knowing what data you have, where it is, what it is – and isn’t – used for the business can be a great head start when defining next steps in discovering, controlling, and activating that data.


On-demand webinar coming soon...


More data, more problems

Acknowledging the problems that massive amount of data poses to your organization is going to be step one in proper classification.


Problem 1: Lack of visibility into a growing dataset

Your organization collects and generates a massive amount of data across different systems in a variety of forms. Before you can establish and enforce policies to promote usability, secure data, and maintain compliance, you must understand what data you have and WHY you have it.


Problem 2: Need to reconcile data risk and reward

Because of this relationship, you’re always on the hunt for technology that helps your business understand the data it has, the risks it poses to the business, external requirements (compliance) related to data, as well as the internal initiatives and expectations related to it.


Problem 3: Time to market

Your business needs to be able to find sensitive data, highlight where it lies, and be able to quickly take remediation efforts in the event of a security incident.


The average volume of data held by an enterprise grew by 42% last year. One of the biggest challenges stemming from this explosion of data is insider access. Does your company know how to monitor and manage this type of data sprawl? Join this webinar to learn more.


What is data access governance?

The key objective of data access governance is to gain visibility into risk and enforce data access policies. Data access management has evolved into an independent initiative that requires an autonomous strategy, budget, and implementation schedule. Data access governance covers many crucial areas, including data security; protecting PII; providing access to critical data assets; and managing permissions.


What is dark data?

Dark data is the information assets an organization collects, processes, and stores during regular business activities, but generally fails to use for other purposes. For example, dark data could come in the form of analytics, business relationships, and direct monetization.


Who is a data citizen?

A data citizen is an employee who is given access to an organization’s proprietary information. Use of the word “citizen” is meant to emphasize the idea that an employee’s right to access corporate data also comes with responsibilities.


What is a data estate?

A data estate is simply the infrastructure to help companies systemically manage all their owned corporate data.


What is data minimization?

Data minimization is a principle that states data collected and processed should not be held or further used unless this is essential for reasons that were clearly stated in advance to support data privacy.


What is Data Security Posture Management?

Data Security Posture Management (DSPM) is an emerging market focused on reducing risk and improving the security around an organization’s most valuable asset – its data.


What is Data Sprawl?

Data sprawl is the proliferation in the number and different kinds of digital information (data) created, collected, stored, shared, and analyzed by businesses, primarily at the enterprise level. On average, organizations have four-to-six platforms to manage data.


What is ROT Data?

Redundant, obsolete, or trivial (ROT) data is the digital information a business has despite the data having no business or legal value, i.e. a duplicated piece of information or data point that doesn’t help the company in any positive way.


In order to cull and manage ROT data, your business needs a data retention and deletion strategy. Join this webinar for tips and best practices on ensuring ROT data isn’t hindering your business.


Shift left: A data classification strategy

Data discovery has as much to do with classifying its whereabouts and importance as it does what actions should ultimately be taken with that digital information. Forward-looking security should be employing the shift left strategy. But what exactly does that mean?


Shift left is a philosophy that looks at data ingestion at the left side of a horizontal funnel (see image). According to IAPP, that narrow end represents the point when data first enters the company’s tech ecosystem. As you move right in the funnel, the amount of data grows with copies, inferences, and data analysis. The point of collection is best suited to classify and inventory data, creating downstream efficiencies. Most companies classify and inventory data toward the right side of the funnel, which is a recipe for delays, inaccuracies, and potential security incidents.

Chart demonstrating the concept of employing a shift left policy for data privacy and security

For security teams to be able to shift left in their data classification strategy, they’ll need a consumer-facing collection point for capturing consent and purpose that integrates these signals into the data map to inform the orchestration of data policies that include access and retention.


What is the responsible use of data?

Now that problems have been acknowledged and definitions for data discovery explained, how does a company responsibly use the data it captures and creates? What exactly is responsible data use?


Much like your business considers and applies guidelines and frameworks around its people, products, and processes, so should it be doing for its data. Organizations need to think of the data it has as part of the people it is tied to. The data must be treated ethically and fairly, just the way people are.


With data creating infinite risk factors to organizations, CISOs are facing unheralded security incidents. Check out this infographic to better understand mitigation strategies.


Consider a three-step approach to the data management lifecycle your business employs:


Discover: Uncover hidden data including good data in bad places, sensitive data with inappropriate access, and hoarded dated


Control: Trigger internal workflows to remove sensitive information, restrict access, or apply privacy-enhancing technology such as encryption or masking


Activate: Promote responsible data usage by automating core privacy workflows, and capturing and governing throughout the data lifecycle


Businesses must consider their needs and goals when using data, no matter which department is processing or controlling that information and regardless of structure. There are six guiding principles to responsible data use that can help organizations.


What’s the purpose?

Data collection should be tied to a purpose, its use limited to that purpose, and disposed of when no longer needed to fulfill that purpose. For personal data, specifically, the purpose should be clearly communicated to the individual at the point of collection.


Be transparent

Organizations should clearly communicate how and why data is collected, used, and shared.


Offer the choice

Individuals should be given the ability to granularly choose or consent to how their data is being used, creating a mutual value exchange that builds trust


Implement governance

Organizations must have the proper technical controls in place to ensure that data is only used as defined by their policies and the informed consent of the individual


Protection through security

Organizations must have the proper security controls in place to ensure that data is protected from unauthorized use or disclosure


Ethical evaluations

Organizations should evaluate the ethical implications of data use as well as the legal implications, especially with emerging technologies such as artificial intelligence


Gain visibility and take action to de-risk your organization’s staggering amount of data. Learn how to implement those strategies in this infographic.

You may also like


Data Discovery

Live demo: OneTrust Data Discovery

See how OneTrust Data Discovery can help your organization achieve complete data visibility to empower your security program and reduce risk.

June 22, 2023

Learn more


Data Discovery

OneTrust Data Discovery Day: A deep dive into automating data discovery and classification

Join us for a two-hour deep dive into data discovery and how OneTrust helps privacy, IT, and security teams understaind their data and achieve risk reduction goals.

June 13, 2023

Learn more


Data Discovery

Monitoring least privilege access risks

Understand common scenarios for applying data access governance within your business and key considerations for evaluating open access risk.

May 18, 2023

Learn more