Data anonymization and masking

Definition of anonymization

What is anonymization? It is a set of techniques: masking, encrypting, and hashing data in such a way that they are indecipherable to third parties. However, data anonymization is not about encrypting or masking all data, but only those considered valuable. Depending on the company, different types of data can be anonymized. With properly defined and implemented anonymization of sensitive data, we are protected (to some extent) against unauthorized use of data by third parties.

There are three concepts worth distinguishing. They are as follows:

Encryption – a form of cryptography where data are encoded in such a way that an unauthorized person cannot read them. You can read more about encryption HERE.

Data masking – also a method of securing data in such a way that it is impossible to read them later, but contrary to encryption, this process is irreversible, yet repetitive.

Hashing – a process of generating a fixed-length string of characters (so-called hash) based on input data of any length. Such a hash looks like a random string of characters, but if we provide data that has not been modified, we always get the same result.

Why do we need data anonymization?

Data anonymization is addressed to every conscious subject. It does not matter whether it is an institution, a small, medium, or large enterprise, or even a sole proprietorship. It can even be an individual who uses sensitive data, that is, the data important to them. Below, you can find a few examples (from Poland and abroad) illustrating how damaging can be lack of attention to data anonymization:

In Poland, sensitive data leaked from several bailiff’s offices in 2016.
In the United States, Edward Snowden revealed the most notorious data leak at the National Security Agency (NSA).

What are the benefits of anonymizing sensitive data?

Before I answer this question, it is worth clarifying what sensitive data, in other words, confidential data are. Their public disclosure can expose the company to problems, such as losses in both financial and legal areas, as well as damage to its reputation.
In the process of anonymization of sensitive data, we may use all or only part of the standards that have already been defined. Examples of such standards are PII (Personal Identifiable Information) and PHI (Protected Health Information). However, it is important to keep in mind that every company can classify their sensitive data differently.
The PII and PHI standards apply only to the USA. It means that the assumptions contained in them may not necessarily comply with the law or guidelines in other countries or regions (for example, the European Union). Yet, they may be a useful source of knowledge on this subject.

It is important to answer the question about the need for anonymizing sensitive data, especially in the context of awareness of the threats posed by data leaks in a company. The benefits of anonymization are the following:

Solving the problem of visibility of production data by individuals who do not use the production environment, for example, people outsourced from external companies.
Solving the problem of data theft and their subsequent use. Such data are not suitable for later analysis, and in the event of theft, they have no value, which additionally solves legal issues.
Informing about the maturity of the organization.
Protecting against unusual data usage scenarios.

Components of full data anonymization.

The process of anonymizing sensitive data consists of techniques that I have already mentioned, that is, masking, encrypting, and hashing. However, before data masking tools became widespread, only encryption and hashing techniques had been used. This made them considered to be the main elements of anonymization. Over time, however, masking techniques began to be used by mixing different variants. The common usage of encrypting, masking, and hashing techniques is strongly recommended. It is so because the data exposed to access from outside the internal network is potentially vulnerable. Poorly used techniques can limit the functioning not only of the user interface itself, but also the speed of the entire infrastructure.

I hope that the subject of encryption is becoming more clear with each article. At the same time, I believe that there is a growing awareness of the importance of this process in every company. You can read about the implementation of data anonymization in the next article, which will be published soon.

Autor

Krzysztof Nancka
Senior software tester
Tester associated with the industry for almost 5 years. At that time, he implemented projects in the e-commerce sector. Always eager for new projects, as he combines work with passion. Security enthusiast who privately deals with Viking historical reconstruction and traditional archery.

Joanna Gawrońska-Krzyszczak

Text translation

Kamil Falarowski

Text revision

Do you want to learn more?