Voice Recognition Software Vs Human Transcription Services…

Soundwaveslaptop.jpg

… Which is the better choice?

Even with advancements in voice recognition software, human transcriptionists still provide the most accurate transcription services possible. After more than 30 years of development, most voice recognition software struggles to turn speech into text, even with a pristine audio recording. But voice recognition software vs human transcription services is something some businesses do ask us about, so we thought we’d compare the two to help you make the right decision.

Avoid a compromised transcript

The trouble with using voice recognition software is that it can easily lead to a compromised transcript in terms of quality, accuracy and even time spent. Some of the largest and smartest artificial intelligence companies in the world (including Google, IBM, and Microsoft) have been working on improving automated speech-to-text transcription for years but still can only get up to 88% accuracy at best. You’ll get 99-100% accuracy by using a professional human transcriptionist. In business, and in professions such as legal and medical, where accuracy is vital, a compromised transcript could leave an organisation exposed.

Humans can handle background noise

Voice recognition software simply cannot cope with transcribing live audio, which typically involves all kinds of background noise. One of the biggest advantage human transcriptionists have over voice recognition software is their natural ability to filter through background noise. Of course, presenting the clearest quality audio file is always the best practice. However, humans can filter through background noise and still deliver an accurate transcript. Automated transcription services have a difficult time handling background noise. This results in inaccurate transcripts or even the complete rejection of the audio file.

Humans can identify different speakers

Voice recognition software typically cannot identify individual speakers and deliver accurate transcripts when there are more than 2 speakers present. This can prove very troublesome for anyone who needs to transcribe audio with 3+ speakers, such as a business meeting; however, humans can understand and identify multiple speakers, even when the speakers sound alike.

Humans understand different speaking styles

Everybody has their own style of speaking. Programming software to recognise human voices has proven very difficult due to the variations in how people speak a language. Despite being the world’s most widely spoken language (there are 1.5 billion of us), English sounds considerably different in each part of the world. Even if we spoke English the same way, there is still the added difficulty of programming a system for different voices – from young to old, male to female, hoarse to soft, fast to slow, to stutterers. Even the same person tends to speak differently in different situations, for example, in a moment of excitement or when in a rush.

Humans can understand accents and dialects

Another obstacle for automated transcription software is accents and dialects. When there are varying accents or dialects the software cannot recognise, voice recognition software struggles. But humans are constantly exposed to varying accents and dialects. This exposure, paired with human transcriptionists’ natural ability to adapt, give humans the upper hand.

Humans can grasp the context

Interpreting speech requires a good understanding of the overall context. Humans possess the ability to interpret ambiguous data and automatically deduce the missing parts based on the context. Conversely, computers can not always interpret the meaning of words and phrases as they lack the ability to comprehend the bigger picture.

Humans can differentiate between words that sound the same

One of the quickest ways to tell if a transcript was done by a machine is to look for errors in homophones. A homophone is a word that is pronounced the same way as another word, but their meanings differ (e.g. sail vs sale). This can easily lead to mistakes. For example, let’s say that a doctor was frustrated and lost his patience with someone. A human transcriptionist would transcribe, "The doctor lost his patience," based on the context. An automated transcriber would likely transcribe, "The doctor lost his patients." Oh-dear!

The challenge of continuous speech

Human speech tends to be continuous, with no natural pauses between words. This poses a difficult challenge: where should a waveform be split to form meaningful words? Given a sequence of sounds, realigning the sounds to form different word boundaries can produce vastly different sentences:

  • It’s not easy to wreck a nice beach; OR

  • It’s not easy to wreck an ice beach; OR

  • It’s not easy to recognise speech!

Humans are your better choice

Currently, there is no software that is sophisticated enough to handle all these things or interpret the overall context of the audio. Only a human transcriptionist can. So make sure you choose the right path… choose Tasman Transcription.

 

What’s a mondegreen?

For a bit of fun – and to make a point…

The challenge of continuous speech sometimes results in a mondegreen. A mondegreen is a mishearing or misinterpretation of a phrase as a result of a near-homophony in a way that gives it new meaning. Mondegreens are most often created by a person listening to a poem or a song. They give rise to some hilariously funny misheard song lyrics. Here are 5 of our favourites – try them when you next hear these songs:

“Saving his life from this warm sausage tea”. Correct lyric: “Spare him his life from this monstrosity” from Queen’s ‘Bohemian Rhapsody’.

“Excuse me while I kiss this guy”. Correct lyric: “Excuse me while I kiss the sky” from Jimi Hendrix’s ‘Purple Haze’.

“I’ll never leave your pizza burning”. Correct lyric: “I’ll never be your beast of burden” from The Rolling Stones’ ‘Beast of Burden’.

“Four-legged woman… four-legged woman two knees”. Correct lyric: “More than a woman… more than a woman to me” from the Bee Gees’ ‘More than a woman’.

“Slow walking Walter, the fire engine guy”. Correct lyric: “Smoke on the water… there’s fire in the sky” from Deep Purple’s ‘Smoke on the water”.