Re-inventing Audio Transcription

Last month we completed four years of our company. We launched CallGraph Skype Recorder in April of 2008 with intention of offering services around it. The transcription service was one of those and it quickly became the most popular. The past four years we have invested all our time and effort in developing a human-powered transcription system with a single goal in mind: deliver the best quality transcript with the lowest amount of effort. In this series of posts we are going to write about this system in-depth.

So why build another human-powered transcription system? Why not just use a Automatic Speech Recognition system. Speech recognition has been in the limelight recently most notably because of Siri which uses Nuance’s Technology. In fact in Google Glass it’s a central component. Even Evernote recently added the support for it. However all of these systems employ keyword recognition; eg. commands that are spoken aloud. Our requirement was conversational speech recognition. The technology for that is still very immature. In fact we tried out CMU Shpinx and results were so poor that we ruled it out.

The big issue with human-powered systems is that it produces inconsistent results. Transcription is very labor intensive. And just like any labor intensive workflow, if you do not have processes in place, you will not be able to control the quality. The typical transcription process involves one person doing the typing work and maybe, another person proofreading it. On an average it  takes around four hours to type one hour of audio and around the same amount to edit it. This increases the cost of transcription. And even after that the transcript is bound to have mistakes, thereby affecting its quality.

So that was the starting point for us. Our system manages the transcription process end-to-end. It’s like a machine where you input the audio file and it outputs a high quality transcript in one day. This system is powered by our certified transcriptionists who do all the work. We have a well defined workflow and a robust process in place. We use some Machine Learning and Information Retrieval tools as well, but for the most part, it is all done by hand.

With this system we have completed more than 3000 hours of audio transcription till date and managed to survive four years in a highly competitive market. The best part is that we a high return rate of customers. For a startup, it might not be a stellar achievement like Instagram’s, but we believe that we have built something substantial; a scalable and reliable transcription service. The next post will cover the first part of our system, the transcriber certification process. Till then, if you are in need of a transcription service then you should try out our transcription service today. You will not be disappointed.

The next part of the series can be found here.

2 Comments

Leave a Reply