Human Versus Software Audio Transcription: Cage Match!

robot vs humansIn the ring today we have human verified audio transcription and automated software…

Who will come out on top? Find out below:

98% Guaranteed — With our proprietary review process, audio transcriptions come out at a 98% accuracy (or higher) every single time… This is due to the rigorous transcription process we implement on all of our clients’ files. First, your audio file is broken into bite-sized 6 minute pieces for our transcriptionists to take a first-attempt. Then the text goes through a review process which ends with timestamps and speaker-tracking being integrated into the text. Finally the work goes through a proofreading phase and a final quality check to ensure we stand behind our guarantee 100%!

Mumbling… background noise? Not a problem! (Usually) — Another huge benefit to taking advantage of humans (instead of an automated software program that attempts the same) comes from audio quality. In a perfect (transcription) world, everyone speaks the same language with the same vocabulary, accent, and tone. Unfortunately, one of the largest stepping stones that needs to be dealt with in audio transcription is the large variance in audio files. Sometimes, people forget to turn on their fancy microphone and instead the important class lecture is recorded on a low-quality device.

This can lead to buzzing, background noise, and unclear audio… a huge problem if your transcription software is designed to work off a specific type of audio. (Hint: This is why audio transcription apps tell you to get one of their recommended recording devices, and to speak very clearly and slowly.) The same exact issue is present with accents and multiple speakers talking over each other.

On the other hand, humans have a huge edge in this department. We’re able to utilize context clues, our own professional experience/knowledge, and our superior brain power to get the most out of each file. We can decipher audio files that software couldn’t dream of handling! Humans aren’t perfect though, sometimes audio files are so far from perfect that even a professional transcriber can’t revive it.

What about my grammar and punctuation?!?! — Unfortunately, this is another issue with software-transcription… how can a computer know when you paused to take a sip of water mid sentence, versus stopping for a sentence (period) or paragraph (period and return).

Slang Vocab… Yo!  — Think of software transcription services as having the vocabulary of your nearest dictionary. A wealth of knowledge… that’s for sure, but how long has that old Merriam-Webster been sitting in your closet? (
Hint: If there’s dust on your dictionary, it’s probably outdated when it comes to colloquial terms.) The main difference here between a human and an application is that humans adapt, grow, and learn with time which static dictionaries become outdated the day they are conceived.

Homonyms “They’re there! Right in the audio.” — Similarly, when your brain is analyzing the speech coming from your friend’s voice… you understand the difference between “To, Too, and Two” but that’s due to your complex understanding of language (not just knowing how words sound). For better or worse, unless your software can analyze, and understand your audio file… you’re not going to see correct homonym usage.

We have a ways to go with vocal recognition before a computer can decipher complex sentence structures, non-common word usage, or mumbling… for everything else, there’s our transcriptionists. 

Leave a Reply