Difference between revisions of "Challenges in AI Implementation in Healthcare"

From Clinfowiki
Jump to: navigation, search
m (Research Challenges)
m (Technical Challenges)
Line 5: Line 5:
  
 
== Technical Challenges ==
 
== Technical Challenges ==
 +
 +
===Data Sharing===
 +
AI and machine learning algorithms require large datasets for training and testing. Normally, one institution or source will not have enough data to power AI development or testing, so data-sharing is needed. While the are several national biobanks that have been developed2, they may not include all data needed for training, and not all institutions may have access. Differences in the quality of data and how it is stored and retrieved make data sharing even more challenging. [3] Interoperability initiatives such as the Fast Healthcare Interoperability Resources (FHIR) framework on Health Level (HL7) are expected to be essential to overcoming this challenge. Furthermore, efforts in data sharing also raise concerns regarding cybersecurity, especially if a third-party developer is involved.
 +
 +
===Susceptibility to Confounding Variables===
 +
Machine learning algorithms, especially those utilizing neural networks, will extract confounding variables from their input to develop their prediction model. For example, a machine learning model used CT scanner model and imaging urgency in its model to detect the likelihood of hip fracture on CT scan images. The developers did not necessarily intend for these variables to be included. [4] Depending on the clinical situation, this phenomenon could reduce a model's clinical effectiveness, especially if the user is unaware of this feature.
 +
 +
===Generalizability===
 +
The generalizability of models is a significant challenge in the clinical application of AI models. This is due to the inevitable difference in the training and testing data used to develop the model, and the live real patient data where it may finally be deployed. These differences can include differences in clinical practice, demographics of the patient population, laboratory and imaging equipment, EHR systems, and more. Even validated and FDA-cleared models may not perform adequately with local patient population data, which Bizzo et al.’s team found when they did a retrospective deployment of a potential new commercial chest CT tool had systemically underestimated areas of radiological findings, and so stopped the implementation.[5] Ensuring generalizability may require some additional model training with local patient data, which only complicates the implementation of a new feature.
 +
 +
===Dataset shift===
 +
Once an algorithm is implemented to a live population it needs to be monitored for data drift. A global pandemic or major changes in diagnostic or treatment guidelines may significantly alter the patient and treatment data that the algorithm takes in, thus possibly altering the performance of the model. Thus, institutions need to monitor performance and identify drift, and consider retraining as needed.[6]
 +
 +
===Algorithmic Bias===
 +
If biases and disparities in care are entrenched in the EHR data used to develop the model, it will only amplify those disparities.[7] One group analyzed a typical model for care management program referral and found racial bias in its referral suggestions. Because it used healthcare costs as a proxy for health, it would categorize Black patients and White patients within the same risk level, even if those Black patients had, on average, several more active chronic conditions. This ultimately led to fewer Black patients who would benefit from additional care coordination from being referred to the program.[8] Other sources of bias originating from EHR data include missing data and low sample size due to access or fragmented care, and misclassification or measurement error due to poorer quality of care received by certain patient groups.[9]
  
 
== Research Challenges ==
 
== Research Challenges ==

Revision as of 17:25, 3 May 2024

Introduction

Currently, there are several challenges in artificial intelligence (AI) implementation in the healthcare setting, ranging from ethical to practical. Despite the wealth of promising new AI technologies, few have seen successful implementation into clinical workflows. The disparity between the promising performance statistics and the lack of ultimate clinical efficacy has been deemed the “AI chasm.” Several healthcare organizations have developed AI governance committees to help evaluate potential AI tools and ease difficulties in implementation [1]. This wiki entry is meant to be a broad but high-level overview of various challenges facing AI translation to clinical care.

Technical Challenges

Data Sharing

AI and machine learning algorithms require large datasets for training and testing. Normally, one institution or source will not have enough data to power AI development or testing, so data-sharing is needed. While the are several national biobanks that have been developed2, they may not include all data needed for training, and not all institutions may have access. Differences in the quality of data and how it is stored and retrieved make data sharing even more challenging. [3] Interoperability initiatives such as the Fast Healthcare Interoperability Resources (FHIR) framework on Health Level (HL7) are expected to be essential to overcoming this challenge. Furthermore, efforts in data sharing also raise concerns regarding cybersecurity, especially if a third-party developer is involved.

Susceptibility to Confounding Variables

Machine learning algorithms, especially those utilizing neural networks, will extract confounding variables from their input to develop their prediction model. For example, a machine learning model used CT scanner model and imaging urgency in its model to detect the likelihood of hip fracture on CT scan images. The developers did not necessarily intend for these variables to be included. [4] Depending on the clinical situation, this phenomenon could reduce a model's clinical effectiveness, especially if the user is unaware of this feature.

Generalizability

The generalizability of models is a significant challenge in the clinical application of AI models. This is due to the inevitable difference in the training and testing data used to develop the model, and the live real patient data where it may finally be deployed. These differences can include differences in clinical practice, demographics of the patient population, laboratory and imaging equipment, EHR systems, and more. Even validated and FDA-cleared models may not perform adequately with local patient population data, which Bizzo et al.’s team found when they did a retrospective deployment of a potential new commercial chest CT tool had systemically underestimated areas of radiological findings, and so stopped the implementation.[5] Ensuring generalizability may require some additional model training with local patient data, which only complicates the implementation of a new feature.

Dataset shift

Once an algorithm is implemented to a live population it needs to be monitored for data drift. A global pandemic or major changes in diagnostic or treatment guidelines may significantly alter the patient and treatment data that the algorithm takes in, thus possibly altering the performance of the model. Thus, institutions need to monitor performance and identify drift, and consider retraining as needed.[6]

Algorithmic Bias

If biases and disparities in care are entrenched in the EHR data used to develop the model, it will only amplify those disparities.[7] One group analyzed a typical model for care management program referral and found racial bias in its referral suggestions. Because it used healthcare costs as a proxy for health, it would categorize Black patients and White patients within the same risk level, even if those Black patients had, on average, several more active chronic conditions. This ultimately led to fewer Black patients who would benefit from additional care coordination from being referred to the program.[8] Other sources of bias originating from EHR data include missing data and low sample size due to access or fragmented care, and misclassification or measurement error due to poorer quality of care received by certain patient groups.[9]

Research Challenges

Many are calling for more rigorous standards to be applied to AI research. The majority of research for algorithms is retrospective, looking at historically-labeled data to train and test algorithms [8]. More prospective studies are needed to understand the performance of AI as a clinical intervention or CDS with real-time patient data. More peer-reviewed randomized control trials are also needed to assess predictive or diagnostic AI algorithms regarding clinical effectiveness and whether there is any difference in patient outcomes. Controlled clinical trials would also be needed to compare different algorithms developed for the same purpose to compare performance. [2,9]

Organizational Challenges

End User Challenges

References

Submitted by Isabella Slaby, DO