The anonymization of personal data and the generation of synthetic data are today essential levers for exploiting data in an ethical and secure manner. However, for organizations that deploy these technologies, a major challenge arises: how to reconcile the operational reality of use cases with the level of requirements resulting from academic research, while remaining calm about regulations?
Academic research, an indispensable compass
To be able to make a clear and complete diagnosis of the residual risk of re-identification after data processing, the systematic and exhaustive evaluation of the various attack scenarios is a mandatory step.
This is where academic work is essential (paper from the Khaled El Emam team; first paper on the MIA; paper from Tristan Allard's team). Researchers are constantly pushing technological boundaries to refine attack scenarios, discover new angles of vulnerability, and implement cutting-edge risk measurement methodologies. It is thanks to this scientific rigor that the industry now has robust metrics to quantitatively assess the level of protection of a data set. Without this constant desire to test the resistance of algorithms in the face of advanced attacks, it would be impossible to guarantee a reliable state of the art.
In this dynamic, a solution publisher must maintain a rigorous technological watch, or even collaborate closely with the academic community. Its mission is to translate this research into concrete tools, by continuously implementing new attack scenarios and associated metrics. This approach is an issue of transparency but also a vector of trust with users. It is this synergy that ensures that end users always have an exhaustive, up-to-date and interpretable assessment of the risk associated with their treatments.
The regulatory framework, from theory to the context of use
What do the regulations say? The European Data Protection Board (EDPS) has identified three fundamental criteria for evaluating anonymization: individualization (singling-out), correlation (linkability) and inference.
- Individualization: it should not be possible to isolate a person in the data set.
- Correlation: it is impossible to link two data sets concerning the same person.
- Inference: it is not possible to deduce new information about an individual.
However, the regulations do not set precise mathematical metrics or absolute thresholds to be reached. It is based on a notion of pragmatism: the RGPD considers that data is anonymous if re-identification is made impossible, in practice, taking into account the reasonable means an attacker could use (in terms of time, costs, and available technologies).
It is therefore not a question of an absolute and theoretical “zero” risk, but of a risk that is controlled and neutralized in practice. A recent decision by the Court of Justice of the European Union (September 4, 2025, case C413/23 P) has also reinforced this contextual approach: pseudonymised data can be considered anonymized if the recipient of this data is unable to re-identify the person. The context of sharing and use is therefore of paramount importance.
Impact Assessment (AIPD) as the keystone of arbitration
It is precisely through the Impact analysis that we manage to bridge the gap between academic comprehensiveness and the reality on the ground. This tool is vital for arbitrating and contextualizing data.
The methodology is divided into two main phases:
1. Quantitative risk assessment (Academic rigor) We start by theoretically measuring the exposure of the data set to all the risks documented by the state of the art. We evaluate, with supporting metrics, the robustness of the summary data generated in the face of attacks by correlation, by inference or by individualization.
Taken in isolation, these exhaustive and extreme evaluations can sometimes deliver protection scores that will appear mixed in the face of certain specific attack scenarios. It is precisely in order to interpret these raw results that the second step is essential.
2. The contextualization of the attack (The reality on the ground) Once the theoretical risk has been quantified, we evaluate the plausibility of the attack in the real world. For example, a sophisticated model reversal attack could require simultaneous access to the complete pseudonymized source data set as well as the synthetic data set. The attacker should also have detailed knowledge of algorithm settings, advanced data science expertise, and significant computing resources.
However, the ease of obtaining this summary data depends massively on the use case: the risk is absolutely not the same if it is a strictly internal analysis or a publication in Open Data.
If a theoretical risk of re-identification exists, but the probability of a successful attack is considered extremely low under real conditions of use, the data can reasonably be considered anonymous in this context.
This assurance is all the stronger as anonymization is rarely done alone. It is complemented by organizational and technical security measures (controlled access, secure environment, contractual locks, etc.) that drastically reduce residual risks.
In conclusion, academic research provides us with the barometer and the diagnoses we need to never move forward blindly. Impact Analysis, for its part, offers us the operational framework to transform these diagnoses into viable, secure decisions that comply with the spirit of the GDPR.
- What is the difference between theoretical and practical reidentification risks?
- How to use the AIPD to validate anonymization?
- What are the EDPS's three criteria for anonymization?
- Why is academic research vital for data security?







