Speech recognition

From Clinfowiki
Revision as of 05:52, 27 February 2011 by Bgsinner (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Speech recognition refers to software that takes human speech and translates it to text for data processing.

This is achieved by taking your voice (via a computer compatible microphone), and converting it to an analog signal. This analog signal is then sent to the computer’s sound card. An analog-to-digital converter takes the signal and converts it to a stream of digital data. At this point, the software then does the translation. [1]

In the context of Health Information Technology, this is used as an alternative to traditional transcription services. The traditional transcription workflow is when a Provider uses the telephone, or other dictation device, to dictate a report, have it typed by a transcriptionist, and then routed back to him or her in a computerized system for additional editing and electronic signature.

As is obvious by the description, there is an inherent delay with this in a clinical setting, and this delay for critical information can have an adverse impact on patient care. In addition, there can be obvious cost savings when the Provider directly edits their reports.

There are two possible speech recognition solutions, both of which have improved turnaround time and savings.

1) Front-end speech recognition, where a computer program does this immediately, and presents the data output to the Provider for immediate correction and signing. Front-end recognition has the advantage of minimal turnaround time; however, it also requires the Provider to implement significant changes to their clinical workflow.

One area where this is problematic is in terms of support. The software needs to be installed on any local computer where it may be used, and cannot be emulated or be a thin application. This makes this solution a highly distributed model, and resource intensive for HIT analysts.

2) Back-end speech recognition refers to a process where the voice data is converted into text by a server based solution. This is still sent to a transcriptionist, and then routed back to the Provider for final signature.

However, back-end speech recognition still captures savings and efficiencies, because the document that is routed to a transcriptionist is already in a state where it is highly edited. In addition, because this is server based, it is considered a centralized solution, with significantly less HIT support overhead.

There are a variety of speech recognition vendors on the market. The intent of this article is to give an overview of the technology, but not analyze, rate or recommend specific vendors. Any excellent report doing so was just published by KLAS. It is called Speech Recognition 2010: Vocalizing Benefits. It is available for purchase on their website: http://www.klasresearch.com


[1] Maistkowski, S. (2000). How it Works: Speech Recognition. PCWorld

[2] Guerra, A. (2011). Speech Recognition Aids Physicians, Cuts Costs. Information Week.

Submitted by Brad Sinner