Privacy Preserving Deep Learning in Medical Imaging

Summary of talk from OpenMined PriCon 2020

This blog post is inspired by Dr.Georgios Kaissis’s talk titled ‘End-to-end privacy preserving deep learning on multi-institutional medical imaging data’ at the OpenMined Privacy Conference 2020.

OUTLINE- Motivation
-- Clinical Applicability of AI in Medical Imaging
-- Patient Rights, Legal and Ethical requirements
-- Privacy-Preserving Machine Learning (PPML)- PriMIA
-- Features of PriMIA
-- Computation Node Setup
-- Federated Training
-- Architecture
-- Encrypted Inference- Case Study: Paediatric Pneumonia
-- Experimental Setup
-- Result

Clinical Applicability of AI in Medical Imaging

With recent advances in deep learning and computer vision, and easier access to computing resources, AI in medical imaging has reached clinical applicability across countries because of promising results in improving diagnosis and in the early detection of diseases. This can be of great help to counter the lack of radiologists in disadvantaged areas. However, the current methodology of central data collection and model training poses a key problem which is why deploying Diagnosis-as-a-Service remains a challenge. We shall enumerate the key problems with ‘data’ and ‘data rights’ in the subsequent section.

Photo by National Cancer Institute on Unsplash

Patient Rights, Legal and Ethical requirements

Even with the proper consent of patients, the data collection process has several inherent challenges in facets of accumulation, storage, and transmission. This primarily impinges on the patients’ right to be informed about the storage and usage of their personal data and medical records. Even if the patients are informed about their rights on the data, in disadvantaged communities, these rights may not even be informed to the patient which further widens the existing inequality in the society. Well, clearly this is a difficult problem to address. In the next section, let’s see the role of Privacy-Preserving Machine Learning (PPML).

Privacy-Preserving Machine Learning (PPML)

Here’s where Privacy-Preserving Machine Learning (PPML) becomes important. We shall now enumerate some of the key concepts and advantages that PPML

It essentially bridges the gap between deriving insights from data (data utilization) and protecting the data of the individuals.
Federated learning involves training the data to remain on the premises of the hospital or medical institute which allows the hospital to retain control over it and enforce better data governance. Put in simple terms, Federated Learning says,

“Do not take the data to where the training algorithm is; Rather, bring the algorithm to where the data is”

Encrypted computation services allow us to protect both the data and algorithm and provide end-to-end services.
It also allows for single-use accountability; the notion that the data collected from the patients is used only for a singular purpose, medical diagnosis and not research or other marketing purposes.

Privacy-Preserving Machine Learning has remained in the proof of concept stage for sometime now. A new tool called PriMIA has been introduced as part of the OpenMined libraries, for federated learning and encrypted inference on medical imaging data.

The goal of the library is to provide securely aggregated federated learning in a multi-institutional setting and provide training algorithms in an encrypted inference scenario.

PriMIA (Image Source)

Features of PriMIA

Framework for end-to-end privacy-preserving deep learning for medical imaging.
A Simple and extensible Command-line Interface (CLI) for secure federated learning and encrypted inference.
Cloud Ready design.
Designed to include current state-of-the-art (SOTA) algorithms.
Flexible to include medical imaging-specific innovations.

The library’s main design goal is to enable a moderately tech-savvy user to perform not more than 3 steps to accomplish the task. The user could be the data owner or data scientist. The steps are outlined below.

Computation Node Setup

If you’re the data owner, and would like to provide data for federated training, the following are the steps.

✅ git clone

✅ Put data in folders

✅ Single CLI call 😊

✅That’s it! ✨🎉🎊

Federated Training

The following are the steps to initiate federated training. There are pre-existing scripts available for hyperparameter configuration over the entire federation.

✅ Single configuration file

✅ Run train.py

✅ Done!🥳

Architecture

A hub and spoke configuration is used which is preferred over serial processing topologies based on previous works in the medical imaging community.

This architecture supports synchronous training.
Secure aggregation is done by secure multi-party computation using the secret-sharing function. There were tests to break the process of secure aggregation and they generally failed, ensuring a robust secure aggregation scheme.

Hub and Spoke Configuration for federated training and secure aggregation (Image Source)

Encrypted Inference

To perform encrypted inference, computational nodes such as a crypto provider and the model server have to be set up, then the data owner initiates a request and receives the result locally. Most of the work is driven from the client-side, including the output of encrypted JSON files. This encryption happens under a zero-knowledge end-to-end premise.

Illustrating Encrypted Inference (Image Source)

To demonstrate the application and effectiveness of the library, there is a case study on data of chest X-rays for detecting paediatric pneumonia. This is an example of a remote diagnosis as a service model.

Experimental Setup

There are 5163 training images
There are 2 external validation datasets.
Goal: To train a federated remote diagnostic model.
The recently introduced confidential computing node has been used for training. A total of 3 computing nodes have been used in the case study.

Result

To assess the performance of the trained model, the results were benchmarked against a locally trained model and two human experts. The results of the study show that the federated trained model

Performed better than the human experts on both datasets
Competitive with the locally trained model

Some of the key challenges for the next 5 years are:

Federated learning on heterogeneous data
Integer quantization for secure aggregation
Reducing computational overhead

Here’s the link to the GitHub repo of PriMIA and here’s the recording of the talk.