A few months ago, my doctor showed off an AI transcription tool he used to record and summarize patient meetings. In my case, the summary was fine, but researchers cited in this report by The Associated Press have found that’s not always the case for transcriptions created by OpenAI’s Whisper, which powers a tool many hospitals use — sometimes it just makes things up entirely.
Whisper is used by a company called Nabla for a tool that it estimates has transcribed 7 million medical conversations, according to AP. More than 30,000 clinicians and 40 health systems use it, the outlet writes. The report says that Nabla officials “are aware that Whisper can hallucinate and are addressing the problem.” In a blog post published Monday, execs wrote that their model includes improvements to account for the “well-documented limitations of Whisper.”
A group of researchers from Cornell University, the University of Washington, and others described their findings in a peer-reviewed study presented in June at the Association for Computing Machinery FAccT conference.
According to the researchers, “While many of Whisper’s transcriptions were highly accurate, we find that roughly one percent of audio transcriptions contained entire hallucinated phrases or sentences which did not exist in any form in the underlying audio... 38 percent of hallucinations include explicit harms such as perpetuating violence, making up inaccurate associations, or implying false authority.”
The researchers noted that “hallucinations disproportionately occur for individuals who speak with longer shares of non-vocal durations,” which they said is more common for those with a language disorder called aphasia.