OpenAI recently released their latest fundamental model for Automatic Speech Recognition called Whisper. This release marks a major milestone for ASR as it has been trained on one of the largest dataset of 680K hours of audio and transcript and therefore is quite robust. The model is also open source and is free for commercial use. We are pleased to announce that we have integrated Whisper into our AI pipeline and it is now generally available to all our customers.

Our initial assessment of Whisper is that it is indeed as robust as being claimed. We are seeing accuracies of around 95% for most of our files. However, Whisper has a few shortcomings as well, namely it does not provide any speaker tracking and does not support strict verbatim (ie, omits all utterances such as ah, uhm, hmm, etc). Speaker tracking is one of the most challenging problems for AI, but the most important feature for our customer. We have an inhouse AI for speaker tracking which has an accuracy of 60-75%. We depend on our human transcribers to correct the rest of it. Similarly, strict verbatim also requires human intervention. 

Whisper is still quite a landmark for the transcription industry. It reduces the load on our transcribers by a large amount. Our transcribers can now be more efficient and productive. They can focus on correcting the speaker tracking and fixing the mistakes in the transcript. The mistakes will be fewer and will require a very special skill that AI does not have currently, which is, interpreting the conversation and applying context to identify mistakes. Our transcribers have that skill and will complement the AI generated transcript resulting in higher accuracy. We also have the checks and balances in place which ensures that the mistakes are being corrected properly by our transcribers.

Our mission at Scribie is to reduce the pain of transcription for both customers and transcribers. We were one of the first transcription services to adopt AI in 2018. We built our own AI with the aim to assist our transcribers. We built a hybrid process where the AI generated transcript was corrected by our human transcribers and delivered to the customer. With Whisper, we are taking  a big step forward in our mission. We are confident that we will be able to pass on the benefits of higher efficiency to both our customers and transcribers in the future.

Leave a Reply