Building a Custom Deep Learning Rig

Deep learning is a very exciting field to be part of right now. New model architectures, especially those trained with Graphics Processing Units (GPUs), have enabled machines to do everything from defeating the world’s best human Go players to composing “classical music”. We wanted to take advantage of its applications in speech and language modeling, and started with AWS G2 instances. We soon found that training even very simple models on a small portion of our data took days at a time, so we decided to build our own rig with specialized hardware. Continue reading “Building a Custom Deep Learning Rig”

Automatic Audio Transcription

Humans Are Better at Transcribing Than Robots

Audio transcription can be a long process, especially if you are a newbie in the field. For many, the automatic audio transcription offers an easy alternative. But is the shortcut worth taking? Statistics would not say so.

Express Scribe, an automatic transcription software, offers an accuracy of around 40 -60% when integrated with the Microsoft Speech Recognition. Google Voice, on the other hand, offers an approximately 80% accuracy but only while transcribing voicemails. That percentage goes significantly down for conversational speech audio. The appalling performance of the various automatic audio transcription or speech recognition software programs even today makes one think why it is so. The reasons are plentiful.

The software fails to factor in the various styles of speaking

A language changes its character depending upon who speaks it. For instance, the way English is spoken in the US is different from how people in India speak it. Teaching a software program how to recognize the variations in human intonations and accents can be very challenging. The problem multiplies when there are groups of speakers involved. Analyzing voice can be equally frustrating for a program. The ease with which the human ear can decipher the spoken words by a variety of voice quality, such as hoarse, soft, deep, etc., does not work in case of a software. In the ideal world, the speaker would have to speak clearly and carefully in order to be accurately transcribed by an automatic audio transcription system. But unfortunately, we don’t get to work in an ideal world scenario.

English can be a tricky language

Sale, sail. Year, ear. Feet, feat. You get the drift. Homophones can be quite tricky and sometimes becomes impossible to understand from a spoken language if we don’t understand the context. Quite obviously, this is a high expectation from a software, and this naturally leads to undesirable mistakes.

The better alternative

Hiring a transcription service with a team of experienced transcribers is still the best. Old is gold when it comes to accuracy, at least in this context. Scribie is completely powered by humans and hence is able to consistently maintain accuracy level of 99% or higher.

Want to find out for yourself? Start uploading your files now.

Humans Are Better Than Machines For Transcription

machine-vs-human-14312853 (1)Transcription services are known worldwide today. With an increase in awareness about the benefits it can reap, transcription is going global. Today companies, businesses want to reach every corner of the world and transcription services can definitely help them in doing so.

We know that technology today is making any thing possible. Many opine that with the advancement in technology, transcription can be completely automated and there is no human intervention required. In fact speech recognition softwares are used by many transcription service providers Continue reading “Humans Are Better Than Machines For Transcription”

Next Generation Technology Aiding Transcription

Technology transcriptionTranscription is an indispensable part of business. It helps in reporting, predictive analysis and much more. It also helps enhance business web presence. For perfect transcription many are using next generation technology to help them in entire process and to make transcription more flawless and accurate.

At Scribie, the motivation behind our service is to deliver perfectly transcribed files in most convenient and hassle free way. Continue reading “Next Generation Technology Aiding Transcription”

Re-inventing Audio Transcription

Last month we completed four years of our company. We launched CallGraph Skype Recorder in April of 2008 with intention of offering services around it. The transcription service was one of those and it quickly became the most popular. The past four years we have invested all our time and effort in developing a human-powered transcription system with a single goal in mind: deliver the best quality transcript with the lowest amount of effort. In this series of posts we are going to write about this system in-depth.

So why build another human-powered transcription system? Why not just use a Automatic Speech Recognition system. Speech recognition has been in the limelight recently most notably because of Siri which uses Nuance’s Technology. In fact in Google Glass it’s a central component. Even Evernote recently added the support for it. However all of these systems employ keyword recognition; eg. commands that are spoken aloud. Our requirement was conversational speech recognition. The technology for that is still very immature. In fact we tried out CMU Shpinx and results were so poor that we ruled it out.

The big issue with human-powered systems is that it produces inconsistent results. Transcription is very labor intensive. And just like any labor intensive workflow, if you do not have processes in place, you will not be able to control the quality. The typical transcription process involves one person doing the typing work and maybe, another person proofreading it. On an average it  takes around four hours to type one hour of audio and around the same amount to edit it. This increases the cost of transcription. And even after that the transcript is bound to have mistakes, thereby affecting its quality.

So that was the starting point for us. Our system manages the transcription process end-to-end. It’s like a machine where you input the audio file and it outputs a high quality transcript in one day. This system is powered by our certified transcriptionists who do all the work. We have a well defined workflow and a robust process in place. We use some Machine Learning and Information Retrieval tools as well, but for the most part, it is all done by hand.

With this system we have completed more than 3000 hours of audio transcription till date and managed to survive four years in a highly competitive market. The best part is that we a high return rate of customers. For a startup, it might not be a stellar achievement like Instagram’s, but we believe that we have built something substantial; a scalable and reliable transcription service. The next post will cover the first part of our system, the transcriber certification process. Till then, if you are in need of a transcription service then you should try out our transcription service today. You will not be disappointed.

The next part of the series can be found here.