A simple error classification system for understanding sources of error in automatic speech recognition and human transcription

From Clinfowiki
Jump to: navigation, search

Error Classification for automatic speech recognition in a clinical setting

Accurate, legible and detailed documentation of clinical encounters is a primary workflow goal in most clinical settings. This study at Regenstrief compared automatic speech recognition (ASR) using Dragon 4.0 against dictation by the clinician / transcription by a human being. Their objective was to understand the sources of error between these two systems when used in a clinical setting.

Continuous speech recognition still produces a substantial number of errors. Knowledge of the types of errors can help direct efforts towards removing the sources of errors. Despite a higher error rate with ASR as compared with human transcribed notes, the authors concluded that many of the errors in ASR notes could be corrected by examining the context alone, assuming domain knowledge of medicine. They discovered that a “disturbing” percentage of errors were left uncorrected and couldn't be understood from the context alone, with a small percentage having the potential to change care.

Types of errors

  • Annunciation errors: “Four Egos” “Before He Goes
  • Stop words: “If he goes through if (with) it.”
  • Suffix errors: “He has eat (eaten) his meal”
  • Homonym errors: “She has a bone spur at the see 3 (C3) vertebral levle
  • Words added: “The she stopped Zoloft on her own”
  • Words deleted: “3 weeks ago she (had a) sore throat”
  • Dictionary errors: “Tom all coding (Tylenol with codeine)”
  • Spelling errors: Dietitian (dietician)
  • Critical errors: Blood sugars are in the 8290 (80 to 90) range
  • Nonsense errors: The last ventral what is to worry 18

Any of these problems can cause problems for downstream systems that require coded data, such as billing or reporting. Critical and nonsense errors in notes can cause problems for patient care. 9.4% (111) of ASR errors in the study were nonsense errors, and 1.6% (20) were critical errors. This compares to only one critical error for human transcription.

Comment: First, the use of ASR is idiosyncratic. Because they focused on a single clinician, it will be important later to validate their findings with a wider group, especially with varying speech patterns or with clinicians less friendly to technology. Nonetheless, the classification system provides a useful tool for follow-up work. Second, the technology is changing. This study was performed using Dragon 4.0 (8.0 is the current release) as well as now-obsolete hardware. Speech recognition requires adequate hardware to keep up with the flow of speech: newer hardware may reduce errors in several of their categories.