There is a growing trend in medicine toward using artificial intelligence to assess the risk of a disease, like diabetes or cancer, and researchers are now training algorithms to help predict the severity of COVID-19 by looking for patterns in x-rays of lungs or CT scans.
But is AI ready for COVID-19? A global team of 40 (and growing) researchers think not.
In a living review published in the BMJ, an international team of statisticians is evaluating COVID-19 prediction models using a set of standardized criteria — essentially a litmus test for how good the machine learning algorithms are. As new models appear, the researchers add their assessments to the review.
The effort is co-led by Maarten van Semden, a statistician at the University Medical Center Utrecht. On June 25, he tweeted, “If you are building your first clinical prediction model for COVID, Stop it. Seriously, stop it. /end rant.”
Clearly, van Semeden has a few things to say about AI for COVID-19 prediction. And he should, having been involved in reviewing the 4,909 publications and 51 studies describing 66 prediction models. He can tell if the various AIs are smart enough to predict the severity of the disease reliably.
It turns out, they’re not.
An AI is only as smart as the data it is built on — something the team says is seriously lacking for COVID-19. For an algorithm to be effective, it needs a ton of data. The review reports that many of the models used small sample sizes, which are simply not big enough to draw reliable conclusions.
This problem is likely exacerbated because research groups aren’t sharing their underlying data. Instead, many of the AI models for diagnosing COVID-19 evaluated in the review rely on the limited data they have available to them, in their local area, a school, or a hospital. Some of the studies also had unclear reporting, meaning they weren’t up-front on the data or method they used to calibrate their algorithm.
“They’re so poorly reported that I do not fully understand what these models have as input, let alone what they give as an output,” van Smeden told Discover Magazine. “It’s horrible.”
In a 2019 pre-pandemic opinion piece, van Smeden and his team refer to poor calibration as the Achilles’ heel of risk prediction models, like those built on machine learning. “Poorly calibrated predictive algorithms can be misleading, which may result in incorrect and potentially harmful clinical decisions.”
The team warns that even claims of AI models to predict COVID-19 that have “good to excellent discriminative performance” are likely biased. The bias can come from study patients who may not adequately represent the general population or from model “overfitting” — meaning the algorithm exaggerates the importance of every random idiosyncrasy in the data. Using AI to make medical decisions before it is vetted is dangerous, and it could lead to poor medical decisions, overspending, and — even worse — more patient suffering.
An AI’s promise to offer a little assistance is likely appealing to the overtaxed hospitals around the world. But the team cautions, “We do not recommend any of the current prediction models to be used in practice,” and stresses the urgency for more data sharing.
Efforts to amass better COVID-19 data from across the country and world are underway. Efforts include observational data listed by the global Cochrane COVID-19 Study Register or the AWS Data Lake for COVID-19, while the NIH is compiling a register of COVID-19 open-access data and computational resources.
But public health data sharing is often challenging. Common barriers include inconsistent data collection methods, problems with consent agreements (where patients agree to participate in one study, but not necessarily more), or lack of “metadata” (the context the data was collected under). To create reliable AI models of the disease, not only will researchers need to share their data, but they will also need to resolve these other issues regarding data collection and sharing techniques.