Archive for the ‘General’ Category

Transcription System: Workflow

Sunday, April 15th, 2012

This is a series on’s audio transcription system. The first part which provides an overview is here

Our workflow consists of five steps.

File Splitting -> Transcription -> Review -> Proofreading -> Delivery

We start by splitting the file into smaller parts. The file is split at the 6 minute boundary which produces one or more files of duration 6 minutes or shorter. This is the first little innovation of our transcription process. File splitting breaks down the work into smaller manageable chunks. It helps in many ways. The file can be worked on parallelly by number of transcribers. A huge amount of effort is not wasted if one part has to be re-done. Additionally, we can track the progress precisely.

Transcription is the typing part. On an average it takes around 15-20 minutes to transcribe a 6 minute file. For a lot of our transcribers–who are mostly home-based freelancers–this is not a huge investment of time. Therefore splitting increases the likely hood that the file will be transcribed quickly. In fact on an average it takes around 1 to 1.5 hours to complete the transcription part of a one hour file!

The accuracy of the transcript is very low at this stage; typically around 50 to 80%. Therefore we do a review. The transcript is checked against the audio and all mistakes are corrected. Time-coding and speaker tracking is also added at this stage. Review usually takes 5 to 8 minutes of effort. But it takes longer for all the parts to get reviewed because we have fewer reviewers than transcribers. This is by design since we promote only our best transcribers to reviewers. The review drastically improves the accuracy.

Once all parts are transcribed and reviewed, we can combine them together and prepare the final transcript. However one more round of review is required here. That’s because, since different parts are worked on by different people, there are bound to be inconsistencies. Proofreading is done by a one person who goes through all the parts together and corrects them. The proofreader is an employee of CGBiz LLC (our company). They are the best of the best we have. We train them and pay them a monthly salary rather than an hourly rate.

The transcript is almost done now. However things might not be perfect even now. The proofreader can make mistakes, some more research may be required for certain terms, etc. So before the delivery we do some random checks. We try to gauge whether the quality is indeed at the level we want it to be. We also use keyword analysis (tf-idf to be precise) to identify out-of-context terms and inconsistencies. We review it again if we are not happy with it. Over time we have found that a small percentage of files require re-review; around 2%. Those are generally the most difficult of files.

Once we are satisfied that the transcript is perfect, as best as it can be, we deliver the file. The file is converted into MS Word, Adobe PDF, OpenOffice Text and plain text formats and we notify the customer that the transcript is available for download.

All of the above happens in 1 day and is managed by our transcription system. We charge only $0.99 per minute of the audio for it. So if you want to get a high quality transcript quickly, please do try out our transcription service today.

The next part of the series talks about the Certification Subsystem.

Introducing Profiles and Stats

Wednesday, February 1st, 2012

We recently launched profiles for our certified transcribers. Here’s a screenshot of my profile.

It contains some background information, relevant professional experience and performance and work history on It gives you an idea of who the transcriber is, how much work he or she has done and how has the work been. The profile is very basic right now but we will be adding more functionality to it soon.

So how is this useful? Well for starters you can check who worked on your files and how much time they spent working on them.

These stats are broken down on a per file basis as well. The stats section has the link.

Additionally, if you’re happy with the result then you can choose to pay a bonus to our transcribers. These transcripts are prepared painstakingly and a bit of appreciation can go a long way! You can pay the bonus to the one’s who have worked on your files or an individual transcriber from their profile page. If you pay to the group then it’s divided up equally amongst all of them. We also do not keep anything for ourselves, except for a 5% charge to cover the fees. All of the money goes directly to the transcribers.

You can also browse all the transcriber profiles from here, the one’s which are public. We break it down in various lists: top 25 transcribers, reviewers, most active, by country etc. Have a look.

Moved From Slicehost to Linode

Friday, December 16th, 2011

We recently switched our hosting provider from Slicehost to Linode. Here’s how it went.

We had been on Slicehost since early 2009 and had a very good experience. We had few service disruptions and the performance was decent enough to suit our needs. But then there was this impending forced migration from Slicehost to Rackspace and after evaluating all options we decided to migrate to Linode instead. Linode and Slicehost are very closely matched, but on Linode you get more RAM for the same price and a 32 bit system. We wanted to wait till Christmas to make the move but the proverbial straw which broke the camel’s back was a day when it took more than an hour to compile Node.JS because of high load on the host machine.

Signing up for the Linode was easy except after payment they asked us to send scanned copies of our identitiy and credit card. It seems that’s due to suspected fraud. But once we submitted the documents the account was activated quickly.

The first step was to setup the server. We could have just dumped the disk image over to the Linode box, but we didn’t because Linode was 32 bit. So we had to set up everything afresh. We found Blueprint while looking for tools to clone a server setup. Blueprint did most of the heavy lifting and copied the configuration files for Nginx, Postfix, OpenDKIM, MySQL, Redis and other essential packages. We left out the source builds intentionally because we had to recompile them anyway. After the server setup we did our application setup and ran the tests to verify everything was working.

The next step was the DNS migration. The first thing we did was to change the TTL on the NS and A records for our DNS Zone on Slicehost to 5 minutes. We used Slicehost2Linode to copy the DNS Zones to Linode and switched the NS servers on the registrar from Slicehost to Linode. We gave it a day for the DNS changes to propagate. The last thing to do was to change the IP Address of the A records to point to the Linode box. Since the TTL had already been decreased within 5 minutes we were back up and running. However the database and setting up the slave re-sync which involved locking all tables, taking a mysqldump and importing it on Linode took another couple of hours.

Post move, things have been pretty smooth. We found that the Linode server is performing much better than the Slicehost one; the RAM consumption is lower (presumably due to the 32 system) and it handles the same amount of traffic with much less system load. The server has been rock solid for the last 2 weeks. The only hitch we ran into was that our host had to be rebooted on the first two days which caused a downtime of an hour or so.

Today we removed our Slicehost account once and for all. Overall we are much happier on Linode and wish we had moved earlier. We are getting much better performance for lesser cost and the migration was not that tough as it seemed.

Scheduled Maintenance Complete

Sunday, December 4th, 2011

We have completed the maintenance as scheduled today and all our services are back up and running. If you face any issues, please click the following link to logout and then login again to your account.


Podcasts: Five Reasons To Have Them Transcribed

Monday, September 12th, 2011

Having your podcasts transcribed and publishing the text content on your website may sound unintuitive at first, but it has several advantages and is worth considering.

Reading vs Listening

Reading is a much faster process than listening. Many of your visitors would want to quickly scan through the transcript instead of listening to it. Some of these visitors might even end up as subscribers for your podcast. This goes for your regular listeners too. Sometimes they just might just want to scan the podcast quickly before deciding to listen to it.

Social Media Sharing

Having the complete text of the podcast online makes it easier for people to share it via Twitter, Facebook, Google+ and the myriad of social media sites which are there nowadays. Your listeners can quote a part of the text or highlight a particular section which they want to emphasize and share it with their own circle. Higher sharing rates means more traffic for your website and associated benefits.

Indexing & Search

The biggest advantage is that the search engines can now easily index your content since the text of the podcast is available. Better indexing will lead to more search traffic and more visitors to your site. You will also benefit from the long tail search traffic, those obscure terms for which your site appears on the search result. Your podcast might be linked by others which in turn will mean a higher Page Rank and even more search traffic.

Contextual Advertisements

If you are using Google AdSense then the transcripts will lead to better contextual advertisements being displayed on your site and higher earnings from AdSense for you. This happens because the Google AdSense crawler first mines the text content on your website and matches it to the ad’s. Since the text content will now be closely related to your niche or topic, your visitors will get to see more relevant advertisements and a higher click-through-rate for you.

E-book Packaging

Selling e-books based on the content of your podcasts is a direct way of monetizing your podcast. Once you have the transcripts it becomes an order of magnitude easier to create an e-book. There are various sites which help you sell digital information content,  ClickBank being one of the popular one’s. You can also sell these e-books to your listeners and visitors. An e-book is also a perfect freebie to give away if you want visitors to sign up for your newsletter.

There are several ways you can get your podcasts transcribed. If you are good typist you try transcribing yourself or outsource it via for $0.99 per minute of audio. It is ultimately an investment which will pay off very handsomely in the long run.

CallGraph Browser Failure

Wednesday, June 29th, 2011

If CallGraph Browser fails to start or if you get an error message similar to the following

Error: Platform 5.0 is not compatible with minversion> =1.8 maxversion<= 2.0

Then please download and install the following setup to fix it.

After the install finishes, please restart CallGraph. To restart CallGraph, please right click on the CallGraph System Tray icon and choose Exit from the popup menu.

Please note that CallGraph will still work fine without the CallGraph Browser. Skype calls will still be recorded and saved in your PC’s My Documents\My Call Graphs folder (or wherever you have set it to). The CallGraph Browser is User Interface which allows you to manage your recorded calls. It is a Mozilla XULRunner application and depends on the XUL runtime. If you update Firefox or uninstall it CallGraph Browser might stop working.

How to Transcribe Audio File

Wednesday, May 11th, 2011

So, you have an audio file which needs to be transcribed to text and have no idea how to go about it. Here’s how to do it.

The first thing you would need is ExpressScribe. It’s a free tool which enables quickly to start/stop/pause playback with hotkeys (or foot pedals) so that your hands are freed up for typing. To setup the hotkeys go to Control -> Hotkeys setup and enter your preferred keys. We suggest the following.

  • F7 -> rewind
  • F8 -> forward
  • F9 -> play
  • F10 -> stop

The useful thing is that these hotkeys are global. So you can control ExpressScribe even if it’s not the active application, which means you can type, play, stop, rewind, and forward without leaving your text editor. Very useful. Load up a test file and play around with the setup.

The second thing you would need is a good headset. Transcription is time consuming and when you wear it for long periods of time you may start feeling a bit of pain around your ears. Any headset with some padding around the earpiece will do. Do not play it on your speakers because then you will make lots of mistakes.

The third thing you would need is a text editor. You can use Word, OpenOffice.Org or any other editor of your choice. At a minimum it should have word completion and spelling auto-correction. While typing you’ll find that you misspell lot of words and to correct them you’ll have to stop and go back constantly. Auto-correction will save you time. Similarly auto-completion will save you typing, at least for common words.

Once everything is set up, you’re good to go. Bring up your editor, play the file, pause and type whatever you understood. Play-pause-type. Rinse and repeat till the file is complete. One pro-tip is try not to rewind too much. Rewind as few times as possible. Better still, instead of rewinding just mark the inaudible portion with a blank or make a guess, and after you’re finished go back and review the file. You’ll finish the file faster this way.

On an average it takes around 4 to 6 hours to finish 1 hour of audio. It varies by your typing speed,  the audio file quality and/or the diction of the speaker. Difficult files take longer. Plus, you’ll notice sometimes you cannot catch a few words, no matter how many times you rewind and play it back. Ask for a second opinion if you can.

As you would have guessed by now, it’s a painstaking task. It takes a lot of time and effort. That is exactly why we have the Audio Transcription Service. We do all the work for you and deliver a high quality transcript after 1 business day. For $75 you can get an hour of audio transcribed. We have a rigorous process and have transcribed over thousands of hours of audio to date. Try it out and check out the results for yourself.

On the other hand, if you like transcribing then check out our Freelance Transcription Program. You can work as a homebased freelance transcriber and get paid on an hourly basis for the work done.

Hiring a Transcriber: audio hour vs man hours

Monday, May 2nd, 2011

If you’re looking to sell our your own e-book then have a look at Jared’s post at Startups Open Sourced. He’s had great success with it and has written up a detailed how-to of the steps involved. One of his tips is, get your interviews recorded and hire a transcriber to transcribe everything which you can work off.

One of the things to remember while hiring a transcriptionist keep in mind the difference between audio hour and man hours, which Jared mentions in his post. An hour of audio can take anywhere from 4 to 6 hours to transcribe. If you’re paying by the hour then your cost will basically be multiplied by that factor. On the other hand if you’re paying by the audio hour, then the amount of effort spend does not matter. You’ll pay for the amount of audio transcribed and not time taken for it to transcribe.

On we charge always by the audio hour. You don’t have to worry about the amount of actual time taken. Even if the audio is hard to transcribe and takes a more effort, you wont have to pay extra. At the end of the day you’ll still get a high quality transcript of your audio file.

Invalid Certificate Issue Resolved

Monday, May 2nd, 2011

Due to an incorrectly configured SSL certificate on our server browsers were showing an security warning whenever any secure pages on were accessed. We have corrected the configuration now and the security warning should go away. If you had faced the issue then please try now. Technical details follow.

The problem was that the intermediate certificates supplied by our CA was not specified. When we moved our domain to we also changed over from Apache to Nginx. In Apache the intermediate certificates where specified by the SSLCertificateChainFile directive. But Nginx does not have a corresponding directive. In Nginx the intermediate certificates have to combined into the server certificate PEM file. Once we did that, the security warning went away.

The odd part was we had run into the same error when we did the domain transition and were testing it out. But that error went away when we did force reloads of the page. So we thought it was an intermittent error which would go away eventually. But it did not and one of our users complained about it yesterday. In the end it turned out to be a simple fix.

iamstarting Podcast

Tuesday, March 29th, 2011

Thanks to Chirag at iamstarting for the podcast on CallGraph and Scribie. We go into back story of CallGraph and some general startup stuff. Check it out.

There’s some amazing stuff on this blog. If you’re interested in the Indian startup scene then this is a blog you must subscribe to.