How to ensure medical AI models remain safe after development ends
Medical AI is transforming the HealthTech environment, with sophisticated algorithms becoming a necessary requirement in new medical device development. However, the lifecycle of a model does not end once it has been developed and deployed. The model can degrade over time as the real-world shifts in ways it wasn’t trained for. This is why continuous monitoring in compliance with ISO 13485 (medical device standard) is essential to the ongoing success of the model in a clinical environment to ensure it remains safe and valid. MedTech developers should also comply with ISO 42001, the framework for responsible development of AI technologies.
In this blog, Camgenium’s Head of Medical AI, Anna Vuolo, shares her expert approach to developing and monitoring medical AI in clinical settings. Anna explores what developers should do to ensure each model remains validated for relevance and effectiveness. Planning for regular performance reviews will help to guarantee that models remain safe when medical AI is deployed in a healthcare environment.
Releasing a health AI model into the clinical world
To deploy a medical AI model, good model performance must have been evidenced and documented. Performance is quantified by metrics such as accuracy, precision and recall. The validation process provides assurance that the model will perform as intended in a real-world clinical environment. Yet, this analysis is a snapshot in time, and it must be assumed that performance could worsen over time. Re-evaluating the model periodically using current data will show whether a model performs well, or if it could have degraded with time. Two major concepts help explain why a model might no longer fit the world it’s been released into, these are data drift and concept drift.
Data drift: Changing observations
Data drift is the term used to describe changing data distributions. Data drift affects a model’s performance. If the data being fed into a model differs from the data it was previously trained on, the model will be less able to make good inferences as the patterns inherent in the new data will be unfamiliar. Changes in data collection methods can result in data drift.
For example, if a medical model is trained on comprehensive diagnostic data from GP records, its effectiveness will be significantly impacted if it receives only sparse data in practice. A hypothetical government policy change introducing GP appointment time constraints might result in less rigorous recording of patient diagnostic data, as GPs stop collecting data about secondary diagnoses and just note primary conditions. As a result, the data the model subsequently sees will be very different from that for which it was optimised. Due to this data drift, the accuracy of the model may be reduced.
Data drift: Diagnostic code count over time
Concept drift: The changing medical world
Concept drift refers to changes in the medical context behind the data. The model has been trained and shown to work well on a set of data from a given point in time. Since the medical context is constantly changing, factors the model has learnt are important may no longer be so, or the relationship between input features and target outcomes may have changed significantly. For example, a novel surgical technique could drastically improve patient outcomes for a certain procedure, resulting in fewer observed complications in the real world. If the model does not know about these advances it will predict poorer outcomes for patients undergoing the procedure than observed in practice. Regular testing and retraining is required to ensure that models learn the new patterns and overcome concept drift.
Concept drift: Complication rate over time
Measuring drift in medical AI
There are various established methods of checking for drift and monitoring medical AI models. At Camgenium our team of Data Science engineers have a deep understanding of the most effective and regulatory compliant methods for monitoring the ongoing effectiveness of a model. Camgenium provide a broad range of metrics and techniques to help visualise, understand and quantify how a model might be changing. Camgenium’s most frequently applied approaches include, calculating PSI and divergence metrics, plotting feature distributions, and evaluating model performance metrics over time.
Population stability index (PSI) over time
Ignorance is blissfully non-compliant for clinical AI
Monitoring is not simply best practice, but essential for maintaining trust and ensuring patient safety. Medical AI models are transforming healthcare, but ensuring they remain safe and valid for patients requires continuous monitoring. Camgenium works to comply with ISO 42001 standards, where monitoring of medical models is required to track their behaviour in the real world, check they work as intended, and catch any problems early to fix issues fast. It is an essential step in the development process when building medical AI models and crucial for ensuring models continue to remain safe for patients.