How automation helps reduce your sensitive data footprint

Establish data retention and minimization policies to reduce your organization’s attack surface

Sam Curcuruto
Sr. Product Marketing Manager, Data Discovery, OneTrust
May 5, 2023

Young Black businesswoman works on her laptop in an open office.

The value of data today is greater than ever before, with companies looking for ways to optimize its collection and utilization to provide customers with timely, personalized experiences. As data’s value increases, so do the associated risks and costs. Cloud storage alone accounts for 30% of a company’s overall IT budget, with one terabyte (TB) of data costing $3,351 per year on average. That’s a cool $1M in storage costs alone for 300 TB of data. Apart from the rising costs of data storage, data breaches are also becoming more prevalent with the volume and variety collected by organizations today. The average damage of a data breach in 2022 sat at $4.35M. 

The problem is clear: More data, more costs, more risk

But is there more value? That’s up to how your organization makes use of it. Hoarding data or collecting it without a clear purpose not only increases the issues of storage cost and breach risk mentioned above, but also violates myriad regulations and other principles of data minimization and data retention policies. 


Unstructured data and its challenges

Well, if it’s so clear that data minimization and data retention is the answer to high storage costs, data breach risks, and non-compliance issues, why isn’t everyone doing it? More than 80% of the data stored by organizations is unstructured. 

This means it’s in the form of:

  • Emails
  • File attachments
  • Images
  • PDFs
  • Other forms of data which don’t’ have a predefined fields like a structured database

This data also usually becomes meaningless in 90 days, and nearly a third of it is considered redundant, obsolete, and trivial (ROT). ROT data not only adds empty data storage costs, it’s also prime fodder for data breaches as it typically sits outside secure systems. It expands the attack surface of your company, which is all the possible risk areas from which an unauthorized user or attacker could breach your system. 

Keeping these concerns with unstructured data and a growing attack surface in mind, most privacy regulations today call out the need to include data minimization practices as a part of standard operation procedures. Recent enforcement actions from the Federal Trade Commission (FTC) show that privacy and data security best practices have data minimization as a key tenet. Companies can start to include this in their data workflows, using privacy by design principles in their products or services to ensure data is minimized from the outset and collection and use are clearly communicated to customers. 


How can companies operationalize data retention and minimization?

Now that the solution of incorporating privacy by design into your products and services from their inception is clear, the next step is figuring out how to integrate them into your processes seamlessly. 

1. Observe your current data lifecycle

To kick things off, look at your most common data workflows and scenarios. Analyze your metadata to see relevant fields data created, last accessed/modified. Identify when data stops being necessary, where data is commonly deleted in these situations, and see how this could correlate to a data retention schedule. 

2. Establish a deletion method

After identifying where data is deleted and formulating a retention schedule around these scenarios, you can apply these retention periods to your data, e.g. archiving or deleting SharePoint files after they cross a certain time threshold. 

3. Use a centralized data governance tool 

When your retention periods are defined and deletion methods are established, using a tool to power this mechanism is the most efficient way to go about this process.  

  • Determine the most accurate set of retention policies for your organization based on your relevant regulations
  • Automate the retention and deletion process by setting business rules and applying them to your files
  • Flag and identify any violations of retention rules in the system
  • Decide whether data needs to be deleted, anonymized, or de-identified and carry out that action accordingly


How can automation help?

OneTrust Data Discovery can help your organization operationalize data retention policies by helping you first identify unstructured data across your entire IT infrastructure. After having full visibility across your data ecosystem in structured, semi-structured and unstructured environments, you can then:

  • Capture business and technical metadata to enable data retention
  • Leverage machine learning to automate policy rules and label data accurately
  • Monitor data over time against the defined policies and ensure controls are followed
  • Track performance with advanced analytics across your data ecosystem, identifying trends and at-risk data


To learn more about how OneTrust Data Discovery can take your organization’s data retention and minimization policies to the next level, request a demo today. 

You may also like


Data Discovery & Security

A guided tour of OneTrust Data Discovery magic

Our expert speaker will demonstrate how common real-world data challenges can be identified, addressed, and reported on, leading to better data governance, security, and alignment with business goals. 

October 26, 2023

Learn more


Data Discovery & Security

Data minimization and risk assessment in data discovery

Explore the concept of data minimization and its crucial role in enhancing security, privacy, and reducing risk.

October 19, 2023

Learn more


Data Discovery & Security

Data Discovery Dispelled: Data's dark corners

Join the first part of our Data Discovery Dispelled webinar series where we will discuss the hidden sensitive information that could pose risks for your organization.

October 12, 2023

Learn more