Speech Recognition and AI

Speech recognition is a technology that can recognize spoken words, which can then be converted to text. Voice recognition is a part of speech recognition which is voice based.

History of Speech Recognition: From Audrey to Alexa

The first speech recognition systems were focused on numbers, not words.

In 1952, Bell Laboratories designed the “Audrey” system which could recognize a single voice speaking digits aloud.

In 1962, IBM introduced the first speech recognition machine, Shoebox. It could understand 16 words: zero, one, two, three, four, five, six, seven, eight, nine, minus, plus, subtotal, total, false, and off.

In the 1970s, the Speech Understanding Research (SUR) program run by US Department of Defense and DARPA supported research in this field. The Harpy Speech Recognition System designed at the Computer Science department in Carnegie Mellon could understand about 1,000 words.

The Bell Lab also introduced a system that could understand multiple voices.

In 1980, IBM developed a talking typewriter for sight-impaired individuals, and the next year introduced a talking display terminal.

As graphic user interfaces grew in popularity during the 1980s, IBM developed one of the first screen readers to work with the new technology.

Also, in the 1980s a statistical method called Hidden Markov Model (HMM) was discovered that estimated the probability of unknown sounds being words instead of just using words and looking for sound patterns.

In 1990s, the personal computer made it possible for big strides in the world of speech recognition.

In 1999, IBM introduced the IBM Home Page Reader, a talking web browser that helped users who were sight-impaired hear the full range of web-page content in a logical, understandable manner.

Dragon Dictate software and a dial-in voice recognition system call VAL (voice portal) by Bell South continued further advancement in this field.

In the 2000s, Google introduced the Google voice search app which included 230 billion words from user searches. Not only did this app make speech recognition available to millions of people, Google was also using it to collect data on user searches to help predict what the user was saying to further improve the accuracy of its app.

In the 2010s, Apple launched Siri. Amazon’s Alexa and Google Home were few more voice recognition apps available to consumers. With all these advancements, speech recognition accuracy has been also rapidly improving with tech companies trying to reduce their word error rate.

What are some applications of voice/speech technology?

Automated customer service: Recognition enables efficient call handling through automated routing questions
Driver safety: Hands-free dialing for phone users, Voice activated navigation system, Voice control & search capabilities for in-car radios
Accessible computing: For vision, mobility or other impairments.
Virtual assistants: Virtual assistants on our phones, Smart speakers at home
Speech to text software: Transcribe interviews, podcasts, dictation, Translate and subtitle content

The Future

In 1962, when introduced, one couldn’t have imagined the multiple applications of the speech recognition technology that we have now.

With the advancements in artificial intelligence and the increasing amounts of speech data that can be easily mined, voice is on it ways to become one of the dominant user interfaces in the world of technology.

Today, this technology is ingrained in our day to day lives with a multitude of voice driven applications like

Microsoft’s Cortana
Apple’s Siri
Amazon’s Alexa
Voice responsive features of Google

Our day to day gadgets like phones, watches, computers, even refrigerators are becoming increasingly integrated with voice interactivity enabled by AI & machine learning.

History of Speech Recognition: From Audrey to Alexa

What are some applications of voice/speech technology?

The Future

Footer