We recently welcomed Redacted.ai into the OneTrust family to further the expansion of our enhanced data redaction capabilities. This move brings powerful data redaction capabilities into the OneTrust platform to solve a broad range of privacy, information security, and legal use cases. We recently hosted a webinar about this acquisition and received numerous questions about what our new data redaction capabilities can do for your privacy, security and data governance programs. We received over 100 questions about the product and the future of OneTrust data redaction.
We took your top questions and created an FAQ series to dive deep into our data redaction capabilities and what they mean for you.
Let’s start with some context.
What is data redaction?
Data redaction means removing sensitive information from files or databases. There are two types of data redaction:
Note: sometimes this type of data redaction is used to mean data masking which is a technique used to hide or obfuscate data from the results of a query to a database (e.g., a credit card number replaced by xxxx in the results file).
Why is data redaction needed?
There are multiple needs for redacting sensitive and personal information from unstructured files such as emails, pdfs, docs etc.
Let’s look at a few key use-cases.
As part of eDiscovery processes, M&A, investigations, and regulatory information sharing, there is a need to redact sensitive information before disclosing it.
Government / FOIA redaction
Governments in several countries are subject to Freedom of Information type requests and privacy-related data requests. Before disclosing information to the person who made the request, sensitive information and/or personal information needs to be redacted from files.
In clinical trials, there are documents relating to patients that need to be shared with various other parties to enable research to be conducted. There is a need to redact the patient’s personal information before sharing those files.
Privacy related redaction in the context of DSARs
When organizations receive data subject access requests or consumer rights requests under privacy laws like the GDPR and the CCPA, the person’s information is sometimes commingled with other people’s information and other sensitive information in the relevant files such as emails, pdfs, docs. There is a need to redact other people’s information and sensitive information before disclosing the files to the requester.
Now let’s jump into your top 10 questions:
1. In what languages do you automatically detect sensitive information?
OneTrust Data Redaction handles the following languages to date and is continually adding more languages. If there is one that is not covered here, please contact us:
2. Can you specify the type of redaction on each redacted item e.g., “Third party name” or “Business Sensitive” or “Third party opinion”?
The default redaction is in black and will cover the applicable area in the file. You can also choose to annotate the redaction with text that may give the reason for redaction and/or state a rule pursuant to which the information is not being shown. You can also choose different redaction colors on the page, including for different types of entities. For example, you could choose that this name should be redacted in red, and another name should be redacted in green.
3. How does it handle attachments on emails?
OneTrust Data Redaction will show the email(s) and in line will display the attachment(s) so those can be reviewed in one go.
4. How do we redact information relating to one person, but not information relating to other people?
OneTrust Data Redaction has the concept of a Do Not Redact list and an Always Redact list. If you know the information of someone you would like not to be redacted, you can add their information to the Do Not Redact list. The product will automatically detect other people’s personal information and redact those.
5. Are the pdfs that are produced safe from software that can remove redactions?
OneTrust Data Redaction creates a new file that contains the redactions. In this new file, the information that is redacted is no longer there. In other words, that information is in the original file, but not in the new redacted file. The redacted area is in that sense covering a blank space. Even if there was software capable of removing the redaction, it would uncover a blank space behind that area.
6. Are you able to redact multiple pages at once?
OneTrust Data Redaction has a few features that allow this to happen:
7. Is there a dictionary for the texts used for redaction? Do we need to maintain this dictionary? Do you have to enter the actual data item to be redacted (e.g., SSN: 123-45-6789) or do you indicate redact anything set as xxx-xx-xxxx in the file?
We use a combination of approaches to detect sensitive information, including Natural Language Processing (NLP). This means that even if the machine has never seen a particular name or address before it will still predict that this set of characters is likely to be a name or address (this approach is not to search from a database of exact matches as would be the case with a dictionary-only approach). Similarly, with a social security number (SSN) the tool will leverage machine learning and NLP to predict that combination of numbers is an SSN without having seen that particular number before.
8. Does it work on PNG files? Which file formats are supported?
Yes, we do support PNG files as well as a wide variety of input files:
9. I’d like to understand if it’s possible to write new classification rules or customize the ones the solution provides. And how to do it?
There is the ability to write your own rules (through RegEX) and save those for use across different contexts.
10. Will it also look at meta data and redact this?
OneTrust Data Redaction looks for metadata in the original files and removes them before generating the new redacted files. For example, the metadata containing sensitive information is filtered out before generating the new redacted file.