Khaled E Emam: Towards standards for anonymizing clinical trials data

khaled_e_emamAlthough we are still at the early stages, manufacturers are starting to make individual participant data (IPD) from their clinical trials available. One of the key issues that has to be addressed is how to deal with the privacy question. If clinical trial data are anonymized, then it can be shared without having to go back and obtain participant consent, although it is important to be transparent and inform the participants that this is happening.

The anonymization practices used by different manufacturers need to be consistent. This ensures that best practices are used broadly, and also facilitates the pooling of data and the comparison of results from the re-analysis of different trials. Consistency also means that a community of anonymization experts will start to develop around these practices, which will be important if manufacturers are to scale up data sharing.

This then suggests a need for operational standards in this area. These standards need to meet at least three obvious requirements:

1. Given that trial participants will be global, the anonymization standards must meet the regulatory requirements across multiple jurisdictions, and need to meet the expectations of multiple regulators.

2. Anonymization methods need to maintain data quality. A basic test is to ensure that the analysis results for published trials on the original data must be the same on the anonymized data. To simultaneously meet this requirement, as well as meet regulator expectations, requires sophisticated approaches to anonymization.

3. Efficiency in anonymizing data is going to be important if data sharing is going to scale up. Early efforts by manufacturers have been time and effort consuming. Replicating this for hundreds of trial datasets will be a challenge.

Because there is no widely recognized standards body in this area, multiple efforts are underway. To ensure consistency across such efforts, a basic set of principles are required to operationalize the requirements above. The US Health Insurance Portability and Accountability Act (HIPAA) of 1996 has a privacy rule, which provides such principles.

While not all health data in the US are covered by HIPAA, recent court cases suggest that it will still be referred to as the basis for duties and standards of care in litigation. Also, the privacy rule is the most prescriptive of all regulations globally in terms of stipulating how data should be anonymized, and there is almost 10 years of practical experience applying these principles on health data, which means that there is considerable community knowledge about what works and what does not. This makes it a good candidate from which to derive some basic principles.

These anonymization principles are as follows (paraphrased for clinical trials):

    • The anonymization of clinical trials data should use generally accepted statistical and scientific methods. There is a large body of work on disclosure control, which can serve as the basis for what is considered generally acceptable.
    • Anonymization should be performed by experts with the appropriate knowledge and experience with disclosure control.
    • Clinical trial datasets that are considered anonymized should have a very small risk of re-identification. This means that the concept of risk has to be operationalized for clinical trials and the “very small” threshold needs to be defined.
    • The methods and results of the anonymization must be documented, and the documentation retained. The period of retention is not specified, but it would be reasonable that it should at least be as long as the data are available for secondary analysis.

The principles above are consistent with guidance in the UK and Canada.

To create appropriate incentives for manufacturers to invest in using standards, scale up anonymization, and make more datasets available, there needs to be a demonstrated demand for the data. Now that some datasets have been made available, everyone will be watching the number of requests coming in from researchers and others.

Read Khaled E Emam’s other blogs in this series:

Pseudonymous data is not anonymous data

What are the privacy concerns when sharing clinical trials data?

Khaled E Emam is the Canada research chair in electronic health information at the University of Ottawa, an associate professor in the department of pediatrics, and is cross-appointed to the school of electrical engineering and computer science.

I have read and understood BMJ policy on declaration of interests and declare the following interests: I have financial interests in Privacy Analytics Inc., a University of Ottawa and Children`s Hospital of Eastern Ontario spin-off company, which develops anonymization software for the health sector.