Ads Top

OK Google! How Voice Recognition Software Works?

Today If you have a modern smartphone, you have the power of voice recognition in your pocket Voice recognition is used for a lot of different reasons. Sometimes it's for productivity and efficiency and other times it's specifically because of a particular disability. The basic premise is the hands-free operation of a computing device. So In this thread, we will discuss the working of the voice recognition software and take a look at all its necessary components.


Speech recognition was first introduced to personal computing around 10 years back — around the time Windows 98 was introduced. However, you may be surprised to know that research on this technology started way back in 1936. The first speech recognition systems could understand only digits. (Given the complexity of human language, it makes sense that inventors and engineers first focused on numbers.) Bell Laboratories designed in 1952 the "Audrey" system, which recognized digits spoken by a single voice. Ten years later, IBM demonstrated at the 1962 World's Fair its "Shoebox" machine, which could understand 16 words spoken in English.


It is the ability of a machine or program to receive and interpret dictation or to understand and carry out spoken commands. Voice recognition and speech recognition are two different terms. Voice recognition relates to identifying an individual voice — along with the same lines as a biometric scanner. Speech recognition, on the other hand, relates to identifying spoken words in the correct sense and then translating them into computer language. 


To convert speech to on-screen text or a computer command, a computer has to go through several complex steps. When you speak, you create vibrations in the air. The analog-to-digital converter (ADC) translates this analog wave into digital data that the computer can understand. To do this, it samples or digitizes, the sound by taking precise measurements of the wave at frequent intervals. The system filters the digitized sound to remove unwanted noise, and sometimes to separate it into different bands of frequency (frequency is the wavelength of the sound waves, heard by humans as differences in pitch). It also normalizes the sound or adjusts it to a constant volume level. It may also have to be temporally aligned. People don't always speak at the same speed, so the sound must be adjusted to match the speed of the template sound samples already stored in the system's memory.

Next, the signal is divided into small segments as short as a few hundredths of a second, or even thousandths in the case of plossive consonant sound -- consonant stops produced by obstructing airflow in the vocal tract -- like "p" or "t." The program then matches these segments to known phonemes in the appropriate language. A phoneme is the smallest element of a language -- a representation of the sounds we make and put together to form meaningful expressions. There are roughly 40 phonemes in the English language (different linguists have different opinions on the exact number), while other languages have more or fewer phonemes.


Speech to text and controlling a machine using your voice is obvious. But the technology holds promise for those with disabilities. Applications like Drive Safely for your phone can read out text messages, and emails for you — helpful for the visually impaired. Various apps also allow you to search the web or type out messages by speaking — helpful for those with limited motor control. This technology has a wide use in the field of Artificial Intelligence and the application likes SIRI & GOOGLE ASSISTANT are the best examples of it.
-> Google Assistant

Google Assistant is a virtual personal assistant developed by Google and announced at its developer conference in May 2016. Unlike Google Now, the Google Assistant can engage in two-way conversations. Assistant initially debuted as part of Google's messaging app Allo, and its voice-activated speaker Google Home. After a period of exclusivity on the Pixel and Pixel XL smartphones, it began to be deployed on other Android devices in February 2017, including third-party smartphones and Android Wear and was released as a standalone app on the iOS operating system in May. Alongside the announcement of a software development kit in April 2017, the Assistant has been and is being, further extended to support a large variety of devices, including cars and smart home appliances. The functionality of the Assistant can also be enhanced by third-party developers.


Siri is the personal assistant that Apple has introduced on the new iPhone 4S. The app in deeply integrated with the operating system and responds to your natural speaking voice. It can be used to make calls, write SMS, set reminders or answer questions with real-time results from the internet. The app adapts to your preferences, style of speaking and takes interactivity to an all-new level. As of now, Siri only supports English, German and French. Plus, it will only be available for iPhone 4S users.


Access – for writers with physical disabilities that prevent them from using a keyboard and mouse, being able to issue voice commands and dictate words into a text document is a significant advantage.

Spelling – you will have access to the same editing tools as a standard word processing solution. Of course, nothing is 100 percent accurate (yet), but the software will catch the majority of spelling and grammatical errors.

Speed – the software can capture your speech at a faster rate than you might normally type. So it is now possible to get your thoughts onto electronic paper faster than waiting for your fingers to catch up.


Set-up and Training can be a significant investment of time. Despite promises that you’ll be up and running in a few minutes after installation, the reality of recording your voice commands is more complex. Capturing your tone and inflection accurately sometimes takes time. Even the software takes a pause at few sentences, as it tries to figure out what you said. Therefore, it all requires patience and clear enunciation.

Frequent Pauses can at times spoil your mood. Remember that the goal was to write faster than you could normally type. Changes in voice tone or speech clarity can cause glitches, as an unrecognized words or acronyms.

Limited Vocabulary – you should also be ready for lots of delays while the software stumbles on your strange words. The simple reason for this is, new industry-specific vocabularies are being added all the time these days.

Hope you guys like this thread. If you had a suggestion or Queries please let me know in the comments below.

Read Also:

   •    Top 5 Smart Learning Apps For Your Android

No comments:

Theme images by Storman. Powered by Blogger.