How to ensure medical AI models remain safe after development ends

How to ensure medical AI models remain safe after development ends

Medical AI is transforming the HealthTech environment, with sophisticated algorithms becoming a necessary requirement in new medical device development. However, the lifecycle of a model does not end once it has been developed and deployed. The model can degrade over time as the real-world shifts in ways it wasn’t trained for. This is why continuous monitoring in compliance with ISO 13485 (medical device standard) is essential to the ongoing success of the model in a clinical environment to ensure it remains safe and valid.  MedTech developers should also comply with ISO 42001, the framework for responsible development of AI technologies.

In this blog, Camgenium’s Head of Medical AI, Anna Vuolo, shares her expert approach to developing and monitoring medical AI in clinical settings. Anna explores what developers should do to ensure each model remains validated for relevance and effectiveness. Planning for regular performance reviews will help to guarantee that models remain safe when medical AI is deployed in a healthcare environment.

Releasing a health AI model into the clinical world

To deploy a medical AI model, good model performance must have been evidenced and documented. Performance is quantified by metrics such as accuracy, precision and recall. The validation process provides assurance that the model will perform as intended in a real-world clinical environment. Yet, this analysis is a snapshot in time, and it must be assumed that performance could worsen over time. Re-evaluating the model periodically using current data will show whether a model performs well, or if it could have degraded with time. Two major concepts help explain why a model might no longer fit the world it’s been released into, these are data drift and concept drift.

Data drift: Changing observations

Data drift is the term used to describe changing data distributions. Data drift affects a model’s performance. If the data being fed into a model differs from the data it was previously trained on, the model will be less able to make good inferences as the patterns inherent in the new data will be unfamiliar. Changes in data collection methods can result in data drift.  

For example, if a medical model is trained on comprehensive diagnostic data from GP records, its effectiveness will be significantly impacted if it receives only sparse data in practice. A hypothetical government policy change introducing GP appointment time constraints might result in less rigorous recording of patient diagnostic data, as GPs stop collecting data about secondary diagnoses and just note primary conditions. As a result, the data the model subsequently sees will be very different from that for which it was optimised. Due to this data drift, the accuracy of the model may be reduced.

Data drift: Diagnostic code count over time

Data drift graph
A hypothetical policy change in early August (dashed line) results in far fewer diagnostic codes being recorded after this point in time. This data drift means a model trained on data from before this time will receive very different inputs from what it expects, and its performance may suffer as a result.

Concept drift: The changing medical world

Concept drift refers to changes in the medical context behind the data. The model has been trained and shown to work well on a set of data from a given point in time. Since the medical context is constantly changing, factors the model has learnt are important may no longer be so, or the relationship between input features and target outcomes may have changed significantly. For example, a novel surgical technique could drastically improve patient outcomes for a certain procedure, resulting in fewer observed complications in the real world. If the model does not know about these advances it will predict poorer outcomes for patients undergoing the procedure than observed in practice. Regular testing and retraining is required to ensure that models learn the new patterns and overcome concept drift.

Concept drift: Complication rate over time

Concept drift graph
The introduction of an improved surgical technique in late June sees a drastic reduction in observed complications (pink) over time. The complication rate as predicted by the model (blue) initially closely matches observations but diverges after this change, demonstrating model degradation due to concept drift.

Measuring drift in medical AI

There are various established methods of checking for drift and monitoring medical AI models. At Camgenium our team of Data Science engineers have a deep understanding of the most effective and regulatory compliant methods for monitoring the ongoing effectiveness of a model. Camgenium provide a broad range of metrics and techniques to help visualise, understand and quantify how a model might be changing. Camgenium’s most frequently applied approaches include, calculating PSI and divergence metrics, plotting feature distributions, and evaluating model performance metrics over time.

Population stability index (PSI) over time

PSI over time
The Population Stability Index (blue/pink) is measured over time, quantifying the change in a feature distribution as compared with the original data. In late July, the PSI begins to increase above acceptable limits. Detection of significant drift indicates model retraining may be necessary.

Ignorance is blissfully non-compliant for clinical AI

Monitoring is not simply best practice, but essential for maintaining trust and ensuring patient safety. Medical AI models are transforming healthcare, but ensuring they remain safe and valid for patients requires continuous monitoring. Camgenium works to comply with ISO 42001 standards, where monitoring of medical models is required to track their behaviour in the real world, check they work as intended, and catch any problems early to fix issues fast. It is an essential step in the development process when building medical AI models and crucial for ensuring models continue to remain safe for patients.