Speech rec or voice rec?

Is there a difference between speech recognition and voice recognition? During a recent meeting, the topic of speech (or voice) recognition came up and it became instantly clear that four different transcription professionals had five different definitions to distinguish the two.

NEMT Communications Director Tara Courtland

One school of thought is that they are completely interchangeable.

Another is that speech rec signifies front-end while voice rec signifies back-end.

Someone else disagreed and said it’s voice rec that is front-end and speech rec is back-end.

I had done my own deep research on this (that is, I Googled it) so I had two conflicting answers of my own.

One is that “voice recognition” means the software recognizes my voice. It doesn’t necessarily know or care what I’m saying, but it knows it’s me. This is used, for instance, in law enforcement phone taps to verify or prove the identity of a person on the phone.

By that theory, “speech recognition” should be properly used for dictation because the software is recognizing the words I speak, regardless of whether it knows who I am. “Speech recognition” is the software that accepts any type of commands by spoken word.

And still another answer online was similar but opposite; that “voice recognition” programs must be trained to a particular speaker’s voice. So all dictation systems are voice recognition programs because they learn to recognize your inflection and accent and can distinguish complicated words and phrases.

That definition holds that “speech recognition” involves simpler commands that can be used by anonymous speakers, such as the automated phone systems that instruct you to speak the words “yes” or “no.” These systems, and also the “call Mary” command you speak to your cell phone, aren’t trained to your voice, but they can’t handle anything complex or unclear.

The only thing everyone agrees on is the difference between front-end and back-end. In front-end speech (or voice) recognition, a doctor dictates and the computer program transcribes it. The doctor may correct mistakes afterward, but a separate human editor never touches the report. In back-end, a doctor dictates and the computer program transcribes it but a separate human editor then reviews the report to correct mistakes.

Back to our meeting; we eventually agreed that the only reasonable solution was to always use the terms “front-end” or “back-end,” in every conversation about this technology, but it left me wondering: what does everyone else use? How do you distinguish the two? Or do you bother trying to distinguish them?

Share and Enjoy:
  • Print
  • Facebook
  • Google Bookmarks
  • Tumblr
  • Twitter
  • LinkedIn
  • PDF
  • RSS

No related posts.

About Tara Courtland

Tara Courtland is the communications director at NEMT.
This entry was posted in Business, IT and tagged , , . Bookmark the permalink.

4 Responses to Speech rec or voice rec?

  1. Khawar Khalid says:


    That was a great analysis over both of it. According to me both are same things and I would say “two names for one thing”. Lets say in Windows 7 we have option of speech recognition and you see in the start we have to train the software with our voice so that it could recognize us and can better understand our commands.

    This is what I had in my mind, I would appreciate others to comment on as well so that we could better conclude on it.


    Khawar Khalid

    • I am pretty sure I’ve been using them interchangeably all along and I’ve been trying to think back on which type of people uses which one in each context but I can’t come up with any clear delineation on who uses each one. I suspect most people are going interchangeable.

  2. Khawar Khalid says:

    I am pasting here exactly what was written at http://www.thespeechgurus.com/speechorvoice.html . I hope this will help clearing your confusion.

    These two terms are often used interchangeably, but they really should not be. They have distinct meanings.

    Imagine you answer the telephone, listen for a few seconds and then say “Caroline, can you call me back? We have a bad connection. I can barely hear you.” You recognized your friend Caroline’s voice. That is Voice Recognition. You couldn’t hear her well enough to understand her Speech. Speech Recognition is trying to understand the words being spoken.

    Voice Recognition can be used like a fingerprint to identify a person. What matters is WHO said it.

    Speech Recognition can be used to control a computer, navigate telephone menus, etc. What matters is WHAT was spoken.

    So, if you want to be correct, use the term Speech Recognition anytime you are talking about controlling something by speaking — even though you use your voice to do it!

    You can use either term when you call us, we’ll know what you mean.

  3. That was my favorite definition as well because it made the most sense to me. That’s the page I used to come up with my paraphrase that “”voice recognition” means the software recognizes my voice. It doesn’t necessarily know or care what I’m saying, but it knows it’s me. This is used, for instance, in law enforcement phone taps to verify or prove the identity of a person on the phone.”

    But I felt in contrasted directly with the other opinion – that voice recognition is trained to take commands from voice while speech recognition identifies commands given by anyone.

    Under both definitions, voice rec knows I am the one speaking, but in the first case, a doctor dictating reports would be speech rec and in the second case, that would be voice rec.

    That’s when my head starts spinning. But the one you just listed is the best definition I’ve seen so far.

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>