Our scientific publications

Core publications

March 10, 2023

Génération de données synthétiques centrées sur le patient, aucune raison de risquer la réidentification dans l'analyse des données biomédicales

Ce papier présente la méthode Avatar d’Octopize. La méthode Avatar génère des données synthétiques qui conservent toute la richesse statistique des données réelles, tout en garantissant la confidentialité des individus.Conçue selon les critères du Comité européen de la protection des données, elle offre une approche centrée sur le patient et une anonymisation robuste, sans compromis entre utilité et protection.

Core publications

August 25, 2021

Application d'une nouvelle méthode d'anonymisation des données d'électrocardiogramme (Chronos)

Cet article présente Chronos : une méthode d’anonymisation dédiée aux données temporelles, testée sur des signaux ECG.Elle génère des données synthétiques fidèles aux originales, avec seulement 3 % de différence de précision sur les modèles entraînés.Une approche qui protège les individus tout en préservant la qualité scientifique des données.

Publications with avatars

March 4, 2025

Leveraging patients’ longitudinal data to improve the Hospital One-year Mortality Risk

This study introduces ELSTM, a neural network that analyzes patients' complete histories to more accurately predict their risk of mortality at one year after admission. To promote research while maintaining confidentiality, the authors have also published a synthetic data set reproducing the real characteristics of patients.

Publications with avatars

Genetic architecture of Multiple Sclerosis patients in the French national OFSEP-HD cohort

This article presents the genetic analysis of the OFSEP-HD cohort, comprising more than 2,600 patients with multiple sclerosis followed over five years. The study highlights the genetic diversity of patients and the limits of original self-declarations, while proposing the creation of a synthetic and anonymous genetic data set to promote sharing and open research.

Avatars vs Originals

June 24, 2025

Privacy-by-design generation of two virtual clinical trials in multiple sclerosis and their release as open datasets

This article discusses the use of the Avatar method to generate synthetic randomized clinical trials using real data from patients with multiple sclerosis. The results show that this approach makes it possible to accurately reproduce clinical analyses while guaranteeing robust confidentiality, even in the face of the most complex inference attacks. Two sets of anonymous and open placebo data have thus been published to demonstrate the potential of the secure sharing of health data.

Avatars vs Originals

July 31, 2024

Application of blinded synthetic data: the case of data from the CONSTANCES cohort

This article presents a large-scale scientific validation of the Octopize Avatar method, applied to the CONSTANCES cohort (more than 160,000 individuals). The results show that the synthetic data faithfully reproduce the statistical associations of real data, with minimal differences and a high degree of agreement between the results. This study confirms that avatar data offers a reliable and GDPR-compliant alternative for sharing and exploiting massive data safely.

Avatars vs Originals

October 23, 2020

Location of intracranial aneurysms is the main factor associated with rupture in the ICAN population

This article presents the French ICAN project, a national program aimed at better understanding the mechanisms of formation and rupture of intracranial aneurysms. By combining high-throughput genetic analyses, imaging and clinical data, the project seeks to identify new genes and biomarkers that are predictive of aneurysm risk. This work paves the way for the development of new diagnostic and therapeutic tools for better prevention of aneurysms.

Avatar vs Other methods

April 5, 2023

Pedagogical notebook on the uses of synthetic data

This article presents the collaboration between Octopize and the Health Data Hub around an educational notebook dedicated to the generation and evaluation of synthetic health data. This work compares several approaches to measure both the quality and the level of anonymity of the data produced. The Octopize method is distinguished by its ability to guarantee confidentiality while maintaining scientific reproducibility, regardless of the use case.

Avatar vs Other methods

October 16, 2024

To be or not to be, when synthetic data meets clinical pharmacology: a study focused on pharmacogenetics

This article compares three synthetic data generation methods — CT-GAN, TVAE, and Avatar — applied to a set of pharmacogenetic data. The results show that Avatar (k = 10) and CT-GAN offer the best balance between data usefulness and privacy, with estimates close to real data. This study highlights the potential of synthetic data for pharmacological research, while identifying ways to optimize small data sets.

Core publications

March 10, 2023

Patient-centric synthetic data generation, no reason to risk re-identification in the analysis of biomedical pseudonymized data

Octopize's Avatar method generates synthetic data that maintains all the statistical richness of real data, while guaranteeing the confidentiality of individuals. Designed according to the criteria of the European Data Protection Board, it offers a patient-centered approach and robust anonymization, without compromising between usefulness and protection.

Core publications

August 25, 2021

Application of a novel Anonymization Method for Electrocardiogram data (Chronos)

Chronos is an anonymization method dedicated to temporal data, tested on ECG signals. It generates synthetic data that is faithful to the original, with only a 3% difference in accuracy on the trained models. An approach that protects individuals while maintaining the scientific quality of data.

Publications with avatars

March 4, 2025

Leveraging patients’ longitudinal data to improve the Hospital One-year Mortality Risk

This study introduces ELSTM, a neural network that analyzes patients' complete histories to more accurately predict their risk of mortality at one year after admission. To promote research while maintaining confidentiality, the authors have also published a synthetic data set reproducing the real characteristics of patients.

Publications with avatars

Genetic architecture of Multiple Sclerosis patients in the French national OFSEP-HD cohort

This article presents the genetic analysis of the OFSEP-HD cohort, comprising more than 2,600 patients with multiple sclerosis followed over five years. The study highlights the genetic diversity of patients and the limits of original self-declarations, while proposing the creation of a synthetic and anonymous genetic data set to promote sharing and open research.

Avatars vs Originals

June 24, 2025

Privacy-by-design generation of two virtual clinical trials in multiple sclerosis and their release as open datasets

This article discusses the use of the Avatar method to generate synthetic randomized clinical trials using real data from patients with multiple sclerosis. The results show that this approach makes it possible to accurately reproduce clinical analyses while guaranteeing robust confidentiality, even in the face of the most complex inference attacks. Two sets of anonymous and open placebo data have thus been published to demonstrate the potential of the secure sharing of health data.

Avatars vs Originals

July 31, 2024

Application of blinded synthetic data: the case of data from the CONSTANCES cohort

This article presents a large-scale scientific validation of the Octopize Avatar method, applied to the CONSTANCES cohort (more than 160,000 individuals). The results show that the synthetic data faithfully reproduce the statistical associations of real data, with minimal differences and a high degree of agreement between the results. This study confirms that avatar data offers a reliable and GDPR-compliant alternative for sharing and exploiting massive data safely.

Avatars vs Originals

October 23, 2020

Location of intracranial aneurysms is the main factor associated with rupture in the ICAN population

This article presents the French ICAN project, a national program aimed at better understanding the mechanisms of formation and rupture of intracranial aneurysms. By combining high-throughput genetic analyses, imaging and clinical data, the project seeks to identify new genes and biomarkers that are predictive of aneurysm risk. This work paves the way for the development of new diagnostic and therapeutic tools for better prevention of aneurysms.

Avatar vs Other methods

April 5, 2023

Pedagogical notebook on the uses of synthetic data

This article presents the collaboration between Octopize and the Health Data Hub around an educational notebook dedicated to the generation and evaluation of synthetic health data. This work compares several approaches to measure both the quality and the level of anonymity of the data produced. The Octopize method is distinguished by its ability to guarantee confidentiality while maintaining scientific reproducibility, regardless of the use case.

Avatar vs Other methods

October 16, 2024

To be or not to be, when synthetic data meets clinical pharmacology: a study focused on pharmacogenetics

This article compares three synthetic data generation methods — CT-GAN, TVAE, and Avatar — applied to a set of pharmacogenetic data. The results show that Avatar (k = 10) and CT-GAN offer the best balance between data usefulness and privacy, with estimates close to real data. This study highlights the potential of synthetic data for pharmacological research, while identifying ways to optimize small data sets.