Artificial intelligence produces misinformation when asked to answer medical questions, but there is scope for it to be fine tuned to assist doctors, a new study has found.
Researchers at Google tested the performance of a large language model, similar to that which powers ChatGPT, on its responses to multiple choice polls and commonly asked medical questions.
They found the model incorporated biases about patients that could exacerbate health disparities and produce inaccurate answers to medical questions.
However, a version of the model developed by Google to specialise in medicine stripped out some of these negative effects and recorded a level of accuracy and bias that was closer to a group of doctors monitored.
The researchers believe that artificial intelligence could be used to expand capacity within medicine by supporting clinicians to make decisions and access information more quickly but more development is needed before they can be used effectively.
A panel of clinicians judged that just 61.9% of the answers provided by the unspecialised model were in line with the scientific consensus, compared with 92.6% of answers produced by the medicine-focused model.
The latter result is in line with the 92.9% of answers reported by clinicians.
The unspecialised model was much more likely to produce answers that were rated as potentially leading to harmful outcomes at 29.7% compared with 5.8% for the specialised model and 6.5% for answers generated by clinicians.
Read more
China risks falling further behind US in AI race with ‘heavy-handed’ regulation
Tony Blair: Impact of AI on par with Industrial Revolution
Large language models are typically trained on internet text, books, articles, websites and other sources to develop a broad understanding of human language.
James Davenport, a professor of information technology at the University of Bath, said the “elephant in the room” is the difference between answering medical questions and practising medicine.
Click to subscribe to the Sky News Daily wherever you get your podcasts
“Practising medicine does not consist of answering medical questions – if it were purely about medical questions, we wouldn’t need teaching hospitals and doctors wouldn’t need years of training after their academic courses,” he said.
Anthony Cohn, a professor of automated reasoning at the University of Leeds, said there will always be a risk that the models will produce false information because of their statistical nature.
“Thus [large language models] should always be regarded as assistants rather than the final decision makers, especially in critical fields such as medicine; indeed ethical considerations make this especially true in medicine where also the question of legal liability is ever present,” he said.
Professor Cohn added: “A further issue is that best medical practice is constantly changing and the question of how [large language models] can be adapted to take such new knowledge into account remains a challenging problem, especially when they require such huge amounts of time and money to train.”