

The essence of quality of a product depends on reproducibility of its desired behavior over course of its use. Such reliability is essential for medical devices to provide safe and effective outcomes for the patients, and hence they require appropriate regulatory approvals before use.
AI/ML based methods have shown incredible promise for use in or as medical devices. These methods rely on improving a ‘basic’ solution architecture by using learning examples. The parameters of the architecture are updated during the learning process. The figure below shows a simple example of creating a decision boundary based on available data. The decision boundary, for instance, can be used to differentiate between a benign and a malignant tumor. When an unknown sample is given it can be classified based on where it lies with respect to the learned decision boundary. The power of machine learning lies in making the decision boundary more precise as more learning data is made available.
This adaptability however poses a challenge for quality and regulatory systems. How does one maintain the predictability of the system that is constantly evolving? How can we be sure that a machine learning system is not learning from the “wrong” training examples?
These are tough regulatory questions. In April of 2019 US Food and Drug Administration issued a discussion paper on the topic [1], followed by an action plan in January 2021 [2]. Below we discuss some key elements related to the topic taking a cue from these papers and ensuing discussion in the AI and healthcare community around it.
It is critical for developers of artificial intelligence and machine learning based software as medical devices to understand the concept of risk categorization. The use of these devices lie on a risk spectrum that is defined by: (a) significance of the information provided to them, and (b) severity of the condition they are being used.
The significance increases from simply informing clinical management to diagnosing and treating the condition at hand. Whereas the severity of the condition ranges from non-serious to critical. The matrix shown above is based on International Medical Device Regulators Forum (IMDRF) risk categorization. The developers of AI and ML based devices should expect stronger constraints and more review on modifications and adaptability for higher risk categories compared to the lower risk cases.
There are several types of modifications that can potentially be made to an AI/ML based system after its initial regulatory approval and release, they broad fall in three categories:
(a) Modifications related to performance: These relate to enhancing the performance of the device. This is usually achieved by training the machine learning system by new data (although with the same type of data), improving so called hyperparameters (like learning rate, complexity of base architecture etc). In such cases the performance improves without making changes to claims related to intended product use.
(b) Modifications related to input type: Next level of modifications involve changing the input types. This refers to not only using more input data but also using categorically different types of data sets. This can include taking images from a different type of scanner or adding parameters like medical history as inputs to a machine learning system. Such modifications still don’t create new claims for intended product use.
(c) Modifications related to intended use: Most significant are the cases where the intended use of the device is changed. These include changes in the significance of the information provided by AI/ML based systems, for instance moving from assisted diagnosis helping with clinical management, to an automated diagnosis by the machine. The intended use change can also potentially change the severity of the condition in which the device is used. Such changes change the risk category of the device and may require additional review and regulatory approvals dependent SaMD pre-specified (SPS) modifications and algorithm change protocols (ACP), that are discussed below.
The discussion paper by FDA proposes several concepts to put a regulatory framework around AI/ML based software as medical devices. The core suggested principle is two fold: (a) defining a clear methodology for post-approval intended changes to the adaptive system (b) adherence to recommended practices throughout the product life cycle like good machine learning practices (GMLP) and real world performance (RWP) monitoring.
Two key concepts related to defining intended changes after approval are SaMD Pre-Specifications (SPS) and Algorithm Change Protocols (ACP).
SaMD Pre-Specificiation (SPS):
Pre-specifications indicate “what” in the AI/ML based medical device can change after it has been approved. The changes can fall into any of the 3 categories described above namely: performance, input types, and intended use. SaMD pre-specifications describes the range of changes that are acceptable for the device without going through a full regulatory submission
Algorithm Change Protocol (ACP):
Once the range of “what” changes are acceptable is defined by SPS. Algorithm Change Protocol (ACP) describes “how” those changes can be incorporated, while maintaining safety and efficacy of the device. These include quality assessment of new data used, performance metrics for the updated algorithm, verification and validation of the software, a scheme for local and global updates, and transparent communication to the users about the changes.
Pathways for post-market changes to AI/ML based devices:
This is probably the most important practical issue for device manufacturers. The current FDA discussion paper suggests that in case the device manufacturer decides on changes to the AI/ML based device, they should follow current risk assessment guidance for software modifications. If the software modification decision indicates the change requires a new regulatory review, however, the changes are covered by pre-described SPS and ACP, the manufacturer doesn’t require a full review but needs to document changes as per approved processes. If the changes fall outside those defined by SPS and ACP, but don’t lead to change in intended manufacturer can apply for a focused review. [Note these are only proposed outlines for discussion from FDA and not an official policy].
Good Machine Learning Practices (GMLP) is an evolving concept that is suppose to follow suit of Good Manufacturing Practices (GMP), Good Clinical Practices (GCP), Good Laboratory Practices (GLP) and others. The adherence of developers of AI/ML based systems to a set of good practices would provide confidence to regulatory bodies in reliability of these devices.
In my view GMLP should focus on four critical components: (a) data, (b) labelling,(c) algorithms, and (d) deployment. The data chosen both for training and testing should be relevant for clinical use cases where it is applied. Mechanism to test if the ML methods are being applied to data outside the space covered by training and testing sets should be built in the process. We can’t dig into all GMLP ideas here, however one key issue that came up in responses to the discussion paper was the issue of bias in AI/ML based systems that further exacerbate the existing bias in health outcomes.
The FDA action paper notes that since AI/ML methods rely on historical data, hence the groups that have been underrepresented in the healthcare system, particularly racial and ethnic minorities can experience below average outcomes if appropriate methodologies are not placed in account for in build bias. Every GMLP plan should include methods to ensure the critical steps like data acquisition, expertise of labelers, established gold standard and others don’t have inherent bias in them that can be propagated by the AI/ML system further. Such bias is already seen in AI/ML methods in techniques like facial recognition. Despite claiming over 90% facial recognition technologies were found to be 20–34% less accurate on dark skinned females compared to light skinned males [3]. A much broader study by the National Institute of Standards and Technology (NIST) found similar biases. The practitioner of AI/ML should ensure the methods developed don’t carry inherent bias for them to have wide adoption in the field.
The changes to regulatory guidelines for adaptive AI/ML based methods are still a work in progress. After the 2019 discussion paper, FDA has issued the Action Plan in Jan 2021 and expects to provide more details later in the year. FDA has invited practitioners in the field to provide their input. The recommendations coming out of this process will not only be critical for medical devices but for the field artificial intelligence and machine learning in general.
- Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD), April 2009.
- Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan, Jan 2021.
- Racial Discrimination in Face Recognition Technology, Oct 2020.