Is there a difference between speech recognition and voice recognition? During a recent meeting, the topic of speech (or voice) recognition came up and it became instantly clear that four different transcription professionals had five different definitions to distinguish the two.
One school of thought is that they are completely interchangeable.
Another is that speech rec signifies front-end while voice rec signifies back-end.
Someone else disagreed and said it’s voice rec that is front-end and speech rec is back-end.
I had done my own deep research on this (that is, I Googled it) so I had two conflicting answers of my own.
One is that “voice recognition” means the software recognizes my voice. It doesn’t necessarily know or care what I’m saying, but it knows it’s me. This is used, for instance, in law enforcement phone taps to verify or prove the identity of a person on the phone.
By that theory, “speech recognition” should be properly used for dictation because the software is recognizing the words I speak, regardless of whether it knows who I am. “Speech recognition” is the software that accepts any type of commands by spoken word.
And still another answer online was similar but opposite; that “voice recognition” programs must be trained to a particular speaker’s voice. So all dictation systems are voice recognition programs because they learn to recognize your inflection and accent and can distinguish complicated words and phrases.
That definition holds that “speech recognition” involves simpler commands that can be used by anonymous speakers, such as the automated phone systems that instruct you to speak the words “yes” or “no.” These systems, and also the “call Mary” command you speak to your cell phone, aren’t trained to your voice, but they can’t handle anything complex or unclear.
The only thing everyone agrees on is the difference between front-end and back-end. In front-end speech (or voice) recognition, a doctor dictates and the computer program transcribes it. The doctor may correct mistakes afterward, but a separate human editor never touches the report. In back-end, a doctor dictates and the computer program transcribes it but a separate human editor then reviews the report to correct mistakes.
Back to our meeting; we eventually agreed that the only reasonable solution was to always use the terms “front-end” or “back-end,” in every conversation about this technology, but it left me wondering: what does everyone else use? How do you distinguish the two? Or do you bother trying to distinguish them?
No related posts.