Pseudonymization vs Anonymization: Key Differences in Data Protection Regulations

Understanding the differences between pseudonymization vs anonymization can be tricky. There are both legal and practical distinctions between the two, and each is suited to different types of data processing use cases.

But fear not. After reading this article, you’ll clearly distinguish these terms and know how to use each method practically in your business cases.

You’ll learn how to get the most out of each method, lawfully share data, conduct analytics, reduce data privacy risks, and ensure compliance with regulations.

And if you’re busy and just want to get a grasp of the difference between pseudonymization and anonymization, we share it right below.

Let’s first take a quick look at what data is being protected with these techniques.

What are PII, PID, and Personal Data

Understanding the different types of personal information is important for considering what you are protecting and why. Below, you’ll see what type of information falls into buckets of Personally Identifiable Information (PII), Personal Identifiers (PID), and Personal Data.

According to the National Institute of Standards and Technology (NIST):

  • Personal Data is any information relating to an individual that could re-identify them, including direct identifiers, indirect identifiers, attributes, and other characteristics that could be used to relink to identity, including information related to a person’s physical, physiological, mental, economic, cultural or social identity. It is a much broader category than PII or PID.

  • Personally Identifiable Information (PII): Any representation of information that permits the identity of an individual to whom the information applies to be reasonably inferred by either direct or indirect means. All PII is Personal Data, but not all Personal Data is PII.

    PII can be any information that lets you trace and identify an individual. This can be full name, address, passport number, email, credit card numbers, date of birth, telephone number, login details, and many more.

  • Personal Identifiers (PID) are a subset of PII data elements that identify a unique individual and can permit another person to “assume” an individual's identity without their knowledge or consent. PIDs are driver's license numbers, passport numbers, biometric records, etc.
What are PII, PID, and Personal Data
Now that you understand what PII, PID, and Personal Data are, let’s jump to the topic of anonymization and pseudonymization.

Knowing whether data is pseudonymized or anonymized is important. This is because truly anonymized data is exempt from the jurisdiction of the GDPR, while pseudonymized data is not. They are also useful for different use cases.

Let’s first take a look at pseudonymization.

What is Pseudonymization and How Does Statutory Pseudonymization Differ?

Before the GDPR, pseudonymization was considered to be a technique. It was often thought of as interchangeable with masking or tokenization.

Pseudonymization was considered to be one method of de-identifying data, where the de-identification process could be reversed. It was viewed as a data protection technique primarily applied to the protection of direct identifiers.

When the GDPR came into force, pseudonymization took on a new meaning (one defined by law).

What is Different About Statutory Pseudonymization?


Pseudonymization under the GDPR has a legal definition that sets a much stricter standard than other data protection methods like masking and tokenization. The GDPR definition of pseudonymization is what we refer to when we use the term “Statutory Pseudonymization.”
The GDPR defines pseudonymization as “the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.”
Statutory Pseudonymization establishes a legal standard or an “outcome” to reach. You must adhere to this standard if you want to be compliant with the GDPR when using pseudonymization.

This legal standard goes beyond the use of one data protection technique or the other, and you may need to use multiple techniques to achieve it. Such a standard means that businesses must process personal data in such a way that it can’t be re-linked back to specific individuals without the use of additional, separately stored information.

Let’s take a look at the 5 specific elements that are necessary for this legal standard to be met.

5 Elements of Statutory Pseudonymization You Should Know About


Statutory Pseudonymization requires five different elements. This is an approach that goes beyond the pre-GDPR understanding of pseudonymization. It requires:

  • Protection of all data elements: This includes both direct and indirect identifiers.

  • Protection against singling-out attacks: Using K-anonymity and/or aggregation.

  • Use of dynamism: Using different tokens at different times for different purposes, and at different locations. This makes re-linking to identity much more difficult from a technological perspective.

  • Inclusion of non-algorithmic lookup tables: This alleviates some of the vulnerability of cryptographic techniques.

  • Controlled re-linkability: The data controller must hold source data separately. They must also protect it with technological and organizational measures.
You can see two of the benefits of Statutory Pseudonymization in the image below. First, with static tokenization, the same token is associated with the same person over time. Dynamic tokenization provides an ever-changing token, meaning the likelihood of re-identifying someone is much smaller.
5 Elements of Statutory Pseudonymization You Should Know About
Static tokenization (left) vs dynamic tokenization (right)
In addition, Statutory Pseudonymization does not only protect direct identifiers (i.e. “obvious” personal information) but also indirect identifiers.
Indirect identifiers are information that does not relate to an individual explicitly, but can be connected with the individual in some way. This means that when you combine different data sets with information known to be tangentially related to an individual, you can in some cases use a combination of that data to re-identify them.
Since Statutory Pseudonymization provides a much higher standard of protection than basic tokenization and masking, your organization can reap a number of business and legal benefits by using it.

Business Benefits of Statutory Pseudonymization


Statutory Pseudonymization offers further benefits for your business. These benefits include:

  • Economies of scale: When data is properly protected, your business can use cloud-based infrastructure (IaaS and PaaS). You can also upload data to the cloud without fear of exposing sensitive data.

  • Faster project approvals: With scalable and predictable data protection controls, projects can be approved faster. This is because your legal and privacy teams can have more certainty about whether your projects are compliant. Pre-configured pseudonymization controls can also streamline similar data use cases, as many of the protection approaches remain the same.

  • Expanded projects and supply chains: With greater levels of predictable data protection, you can engage in more data sharing. This means your business can obtain more data from a wider variety of sources, enriching your data sets. This can be useful for expanding and improving use cases such as analytics, ML, AI, and monetization.

  • Speed to insight: Statutory Pseudonymization does not reduce utility in the way that some masking techniques do. This allows you to get relevant information and results quickly so that you can focus on business outcomes.
In contrast to anonymization, pseudonymization enables a wider range of use cases. This is because anonymization makes re-linking impossible, meaning it cannot be used for personalized use cases. This is especially the case when the results of an analysis may need to be re-linked to identity in the future, such as with medical data.

Statutory Pseudonymization can also be used in other industries such as insurance and banking. The most typical use cases include fraud detection, analytics, and risk assessment. For these use cases, pseudonymization protects sensitive insurance customer information, while retaining data utility.

Despite the many positive aspects of pseudonymization, there are some elements to bear in mind. For example:

  • Remaining re-identification risk: There is no risk-free technique. With every approach, there is always a risk of re-identification that remains, and it is important to assess these risks and reduce them wherever possible. This is because there is always room for human error in implementing a technique, as well as sophisticated attacks against privacy on a continual basis.

  • Potential reduced utility compared to cleartext: Depending on which combination of specific techniques you use to achieve Statutory Pseudonymization, there may be some loss of utility. However, with Anonos’ approach to Statutory Pseudonymization, there is no loss of utility compared to cleartext.

  • Cost of implementation: All data protection techniques have set-up costs, and may need experts to help you implement them. Statutory Pseudonymization techniques are no exception. Like other data protection methods, it requires experts to ensure that it has been properly achieved.

  • Remains personal data: Data that has been Statutorily Pseudonymized is still personal data under the GDPR. This means the law still sets out certain requirements for it. However, the GDPR also provides statutorily recognized expanded use rights for data that has been pseudonymized. The requirements and benefits that apply will depend on individual situations.
Despite these elements, meeting the legal standard of Statutory Pseudonymization can still provide numerous benefits.

Legal Benefits of Statutory Pseudonymization


There are also numerous legal benefits that come with Statutory Pseudonymization.

  • Surveillance-proof processing: Pseudonymization is one of the approaches noted by the European Data Protection Board as enabling the transfer of data assets internationally in compliance with Schrems II and the U.S. Cloud Act.

  • Lawful processing: By meeting the standard of Statutory Pseudonymization, you can overcome the shortcomings of processing data based on consent and contract. This means you can carry out data processing under legitimate interest grounds. This is particularly useful for advanced analytics, artificial intelligence (AI), and machine learning (ML).

  • Breach-, ransomware- and quantum-resistant data processing: Statutory Pseudonymization de-risks data at rest, in transit, and in use. Sensitive details are obscured with Statutory Pseudonymization, which reduces attack surface area.

  • Data supply chain defensibility: When data is protected with Statutory Pseudonymization, everyone in the data supply chain is protected against joint and several liability for data sharing, combining, and processing.
You can see some more of the legal benefits of Statutory Pseudonymization in the graphic below.
Legal Benefits of Statutory Pseudonymization
All of these benefits make Statutory Pseudonymization an attractive choice for enterprises. If you’re still wondering how you might use it in practice, let’s take a quick look at one example of what you might want re-linking for.

What is Statutory Pseudonymization Useful For?

Statutory Pseudonymization is useful when the identity of the data subject is still relevant for the use case.
This is often the case with medical and health records because the identity data of patients needs to remain connected with their records.

In addition, health professionals often share data between themselves, in the process of providing coordinated treatment. Anonymizing or aggregating medical data could strip away vital information necessary for patient care.

Thus, Statutory Pseudonymization is a valuable standard to meet, enabling secure data sharing while protecting patient identities. Importantly, the de-identified information can be re-associated with actual patient identities only by authorized individuals who have access to the separately stored additional information.

This approach is often used in health contexts globally. For example, Statutory Pseudonymization is similar to the most effective means of “de-identification” under the Health Insurance Portability and Accountability Act (HIPAA) in the US. One important note is that Statutory Pseudonymization actually allows greater utility and a higher level of privacy than what is mandated under HIPAA.

This is because HIPAA requires that certain identifiers be removed from records (called the “Safe Harbor” method), leading to loss of utility in some contexts and reduced privacy in others (when identifiers are left unprotected). HIPAA does not use the same terms of “pseudonymization” or “anonymization” as the GDPR, but it’s important to understand how these different terms are related to each other, especially if you are operating internationally.

We’ll get into the details of HIPAA and data protection laws from other countries later on.

It’s important to remember: While Statutory Pseudonymization has a number of benefits, it may not be the right choice for all use cases.

In certain situations, such as statistics and general analytics, re-linking to identity is not necessary or desirable. In these cases, anonymization, including the production of synthetic data, may be a better choice. While both methods intend to protect personal data, they do it differently.

Let’s take a look at anonymization in more detail.

What is Data Anonymization?

Anonymization irreversibly de-identifies personal data. This means that no additional information can re-link the data to the individual from whom it was derived, and anonymized data falls outside the scope of GDPR regulations.
This is in contrast to pseudonymization, which is a reversible process. Yes, it disconnects personal data from the identity of the person it relates to (which reduces the risk of privacy breaches) but can be re-linked in specific circumstances.

Data is considered anonymous under the GDPR if there is a “reasonable likelihood” that it can no longer be attributed to a natural person. The question of whether data is anonymous or not is best viewed as a spectrum.

Anonymization can be achieved with many different data protection methods, often in combination.

Some of these techniques include generalization, k-anonymity, differential privacy, data perturbation, swapping, and the creation of synthetic data.

Some of these approaches, such as differential privacy and generalization, reduce data utility. Others, like the creation of synthetic data, can provide a high level of utility for many use cases.

One of the main benefits of synthetic data is that the synthetic data sets look the same as the real data without containing personal information.

In the graphic below, you can see an example table that contains information about employees’ commuting methods and the department they are in. Synthetic data is produced by looking at the relationship between the original data values, and making up new (fake) data values that fit the same statistical distribution as the original data.
What is Data Anonymization?
Synthetic data breaks the relationship between the original data subject and the analytics information you want to obtain. This increases the level of data protection.

Techniques such as synthetic data and differential privacy are often combined to increase the likelihood of data anonymization.

For context, differential privacy (DP) is a mathematically sound definition of privacy for statistical and machine learning purposes. By looking at the output of a differentially private algorithm, one cannot determine whether a given individual's data was included in the original dataset or not.

These layers of protection significantly enhance the privacy of the synthetic data. However, it’s important to remember that no method can ensure perfect privacy while maintaining utility.

Using anonymization makes a lot of sense for several use cases, particularly in analytics and training machine learning algorithms.

Advantages of Anonymization


One of the main advantages of anonymizing data is that if done properly, compliance with the GDPR and other data protection regulations is no longer required.

In addition to this, advantages of anonymization include:

  • Protection against breach: When done properly, the privacy of individuals is very high. This protects against misuse of personal information, unauthorized access, breaches, and any follow-on effects such as damage to business reputation.

  • Aggregate analysis: Anonymization, particularly synthetic data, is especially useful for several use cases. These include analyses of large volumes of data, analytics, statistics, sales reports, machine learning, AI training and development, and testing AI models.

  • Data sharing: Anonymized data is no longer considered to be personal data under the GDPR, so you can share it more freely.

Disadvantages of Anonymization


While anonymized data is no longer subject to the GDPR, there are still some disadvantages that you should be aware of.

  • Doesn’t allow re-linking: When all identifiers are deleted from data, you can no longer use it for a variety of use-cases that require re-linking. This includes any personalized services such as marketing, targeted advertising, medical or health services, and some financial services.

  • Privacy risks remain: Anonymization can seem like a silver bullet, but truly anonymizing data can be difficult. It can also be difficult to ensure that anonymization has been effective. One risk is that multiple data sources can be combined to re-identify data subjects from “anonymous” data. This is especially a risk in collaborative settings where more and more data is combined.
Regardless of which approach you choose, it is important to remember that no data protection approach is foolproof. Every method is susceptible to attacks, and should ideally have layers of both technical and organizational protection in place. In addition, implementing data protection methods also requires expertise to ensure that it has been done correctly.

As noted, even data thought to be anonymous is vulnerable to certain types of attacks and risks. No matter which method you choose, be aware of the main methods of attack that malicious actors can use against you.

Three Attacks Against Privacy


There are three main risks of data re-identification when it comes to data protection. You can improve the privacy of your data sets by considering the below potential “attacks”, and protecting against them.

According to Working Party Article 29, anonymous data is robust if it’s protected against the following attacks:

  • Singling out: It is still possible to single out the records of an individual (perhaps in a non-identifiable manner) even though the records are less reliable

  • Linkability: It is still possible to link the records of the same individual, but the records are less reliable. Thus, a real record can be linked to an artificially added one (i.e. to 'noise'). In some cases, a wrong attribution might expose a data subject to a significant and even higher level of risk than a correct one.

  • Inference: Inference attacks may be possible, but the success rate will be lower and some false positives (and false negatives) are plausible.
There are additional ways to ensure that your data sets are well protected, including using external technical tools to analyze them. In many cases, these tools perform fake attacks against the data set, trying to determine whether records can be singled out or re-linked.

One example of these kinds of tools is the Anonos Anonymeter. With the Anonos Anonymeter, the risk and robustness of your anonymization process can be checked. The Anonymeter checks your data for how well it is protected against these kinds of attacks.

You can see in the image below how the Anonymeter has provided a privacy score. This is the measure of how well-protected the data set is. Below the overall privacy score, it scores how well the data set has held up to inference, linking, and singling-out attacks.
Three Attacks Against Privacy
Using tools like this can add another layer of certainty to your data protection approach.

Nonetheless, you still have to decide which data protection methods to apply in each use case. We’ll go into the details below of the key distinguishing points between pseudonymization and anonymization, so that you can make an informed decision.

Statutory Pseudonymization vs Anonymization: Key Differences

As you can see, there are a number of key differences between Statutory Pseudonymization and anonymization. These differences are also reflected in the GDPR. As you now know, Statutory Pseudonymous data is still considered personal data under the GDPR, while anonymized data isn’t.

However, Statutory Pseudonymized data also offers benefits under the GDPR. This includes reduced disclosure obligations in the event of a breach, ability to conduct cross-border transfers of data (such as for EU-US data processing within one company), and lawful legitimate interest and secondary processing.

This is a summary of the differences between anonymization and pseudonymization under the GDPR:

Aspects Anonymization Pseudonymization
Definition The process of ensuring that there is a “reasonable likelihood” that the data can no longer be attributed to a natural person. The removal of personally identifiable data. The processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information
Utility Medium / high utility High utility
Reversibility / Re-linking Cannot be re-linked Can be re-linked in controlled conditions
Status under GDPR Not personal data Personal data
Risk of data re-identification Low Low (when compliant with the standard of Statutory Pseudonymization)
Cross-border data transfers Yes Yes
Legitimate interest processing Only in certain circumstances Yes
Protects data in use Yes Yes
Data sharing for monetization Yes Yes
Sharing for data enrichment No Yes
Test, development, and demo data Yes Yes
Sharing with service providers Yes Yes

Comparing the aspects of each approach side by side can help you to determine the strengths and limitations of each.

One of the best ways to examine both Statutory Pseudonymization and anonymization in a real-world context is to examine your specific use case, and initiate a pilot with a provider of these techniques.

Testing out Statutory Pseudonymization, or anonymization approaches such as synthetic data, can help you to determine which is more suitable for your business case. In other cases, it may help you to expand more of your use cases and increase the utility of your data. We’ll go through a few examples of potential use cases shortly.
First, it’s important to note: Whichever approach you choose, remember that different jurisdictions define these terms differently. Some jurisdictions don’t use these terms at all but imply their outcomes in the legislation nonetheless.
Let’s quickly go through an overview of these differences.

Compliance Requirements for Statutory Pseudonymous vs Anonymous Data Across Jurisdictions

Different data protection laws provide different definitions for both pseudonymization and anonymization. Here are some examples of different data protection laws that you might come across when doing cross-border transactions or analytics.

United States


In the United States, laws like HIPAA and CCPA imply the concepts of anonymization and pseudonymization within the requirements for de-identification of personal information and risk reduction. HIPAA does not use the terms “anonymization” or “pseudonymization” at all in the legislation. Instead, HIPAA sets out the “Privacy Rule”, which allows two ways of achieving de-identification.

The first method, the “Expert Determination” method, allows an entity to have an expert apply methods to ensure that the “risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information”. This is analogous to the idea of anonymization under the GDPR.

The “Safe Harbor” method allows de-identification to be achieved through the deletion of identifiers. This can be either a permanent deletion of identifiers or an identifier that can be re-linked to identity to ensure that health information remains useful. This approach is similar to pseudonymization under the GDPR, although the GDPR requires a higher standard of de-identification than HIPAA, protecting both direct and indirect identifiers.

There have been increasing concerns about whether the Safe Harbor method is actually sufficient for data protection, given that it only specifies 18 identifiers (leaving others vulnerable). Note: the “Safe Harbor” method under HIPAA is not the same as the Safe Harbor agreement for EU-US data transfers.

The CCPA also establishes the idea of de-identification as akin to GDPR anonymization. The CCPA states that de-identified data is “data, which cannot reasonably identify, relate to, describe, be capable of being associated with, or be linked, directly or indirectly, to a particular consumer”.

Canada


In Canada, PIPEDA does not define de-identification, anonymization, or pseudonymization as a means to protect personal information. Anonymous data would not be considered to be “personal information” for the purposes of PIPEDA, while pseudonymized data would still be covered.

In this way, you can treat PIPEDA rules similarly to the GDPR, other than that PIPEDA does not grant any “benefits” for using pseudonymization as the GDPR does.

Singapore


The law in Singapore for protecting personal data is called the Personal Data Protection Act (PDPA). The PDPA does not define anonymization or pseudonymization. However, like the GDPR, anonymized data is excluded from the scope of the PDPA.

The differences between these jurisdictions can make it difficult for cross-border data transfers and global business transactions. However, the GDPR is widely considered to be the strictest data protection law globally. This means that compliance with its provisions can stand your business in good stead for compliance with other jurisdictions.

Using technical and organizational measures to protect data that are granular, flexible, and able to be adjusted depending on the context, can also help your business to comply with a variety of different privacy laws.

In addition, the choice of anonymization or pseudonymization depends significantly upon the use case that you want to use data for.

Below we cover a number of use cases that highlight how to use these different approaches to get the most business benefits. We also cover how Anonos enables these use cases for you.

Statutory Pseudonymization vs Anonymization Use Cases: How to Stay Compliant while Keeping High Data Utility

You now know that Statutory Pseudonymization and anonymization are appropriate for different use cases. This depends on the level of data utility needed and the type of analysis that needs to be performed. These use cases provide practical examples of how you can use these approaches.

Use case: Insurance


In the insurance industry, the needs of customers are extremely important. Provinzial, the second largest public insurance group in Germany, wanted to train a model for “next best offer” predictive analytics, to identify the needs of over a million customers. This required the use of customer data, which Provinzial also wanted to protect.

The insurance customer information was highly detailed and very sensitive. Provinzial was searching for an anonymization solution that would not compromise the utility of the data.

Anonos Data Embassy helped Provinzial by producing synthetic data from their records. In doing so, they were able to achieve 80% usability of synthetic data while maintaining data anonymity. A model trained using this synthetic data was also able to achieve 97% in performance effectiveness.

Synthetic data in this case was particularly useful. This was because it maintained the statistical relationships in the data and ensured a high level of privacy. The Anonymeter, Anonos’ privacy-assessment tool, was also applied to check the levels of protection throughout the process.

Use case: Healthcare


For medical and healthcare data, in particular, Statutory Pseudonymized data can be extremely useful.

In this use case, a global medical research institution wanted to undertake a project involving cross-departmental data analysis. This analysis was still required to be compliant with HIPAA and GDPR.

It was necessary to re-link the data back to the original patient records after the analysis had been performed. The sensitivity of the data required a high level of privacy, but also analytical precision.

This medical research institution was able to use a number of Privacy Enhancing Technologies (PETs) to alleviate this problem. They used Data Embassy Variant Twins to combine Statutory Pseudonymization and synthetic data.

Statutory Pseudonymization enabled the detailed analysis of patient data to improve treatment and keep data safe by protecting direct and indirect identifiers while allowing re-linking to patient identity in controlled conditions.

Synthetic data was used to expand the dataset. Especially with rare medical conditions or health information limited to small patient groups, artificial data records can complement and balance original pseudonymized datasets. This provides higher-quality analyses. It also expands the research while minimizing data collection from individual data subjects.

Use case: Credit Scoring


CRIF, a global company specializing in credit and business information systems, wanted to develop accurate credit scoring models that retained privacy. If the data used for credit scoring models is not accurate enough, the risk-prediction aspects of the model do not function properly.

CRIF used synthetic data provided by Anonos Data Embassy to develop a credit scoring model, which preserved the same statistical relationships as the original data, ensuring both accuracy and privacy. Because it contained no real-world data, it did not qualify as containing any Personal Identifiable Information (PII), allowing CRIF to share data internally without issues.

Additionally, this approach enabled CRIF to process data more quickly and operate across multiple jurisdictions.

Getting the Most Out of Statutory Pseudonymization and Anonymization

The differences between anonymization and Statutory Pseudonymization significantly impact business operations.
  • Anonymization removes data from GDPR scope and is particularly useful for AI, analytics, model training, test data, and machine learning processes.

  • In contrast, Statutory Pseudonymization is still covered by the GDPR and allows identities to be reconnected under controlled conditions, making it ideal for scenarios like medical data management and patient treatment, personalized offerings to customers, data sharing, and working with third-party service providers.
Both methods enhance privacy and protect against data misuse, breaches, and unauthorized access.

Sign up for our newsletter to discover more detailed use cases related to these techniques.