How to Use AI to Transcribe Your Interviews Easily
Several apps now offer AI-powered transcription. Here's what to look for when choosing one.
It’s been a few years since my mind was blown by Otter, one of the original apps to do AI-powered transcription. Not only was the text reasonably accurate — the UI was inspired, letting the user tap on any word to start playback at that point. As a journalist, I instantly saw how it could save me tons of time.
Now, AI transcription is everywhere. A bevy of apps, and features within apps, are up to the task, but which ones have standout features, and what are the best ways to make use of them? It’s a lot to sort through, so that’s why I’m excited to read Christopher Allbritton’s summary of the go-to transcription apps for busy media professionals.
Before you jump down, though, a quick reminder that this is the last free how-to post. After this week, The Media Copilot’s practical, very specific guidance on how to use AI tools to level up your journalism or PR work will be going behind the paywall. Subscribe now to ensure you don’t miss any, plus you’ll lock in the original subscription price of $5 a month or $50 a year (as of Tuesday, it’ll be $10/$100). Hit the button to join! —PP
This week, we’re looking at using AI tools to ease that most tedious of tasks: transcribing notes and interviews. I know, I know, isn’t that what interns are for? But as the youths say, “In this economy?”
Thankfully, automatic transcription has become a lot easier and more accurate. Modern transcription tools combine Automatic Speech Recognition (ASR), Natural Language Processing (NLP) and machine learning to convert spoken words into written text. ASR converts the audio into text, while NLP helps interpret and structure the content. These technologies work together to recognize speech patterns, handle different accents, and understand context, resulting in more accurate transcripts.
If you need highlights or summaries of a recorded meeting or an interview, NLP algorithms — sometimes enhanced by Large Language Models (LLMs) — can provide concise overviews. Parsing and understanding language is at the core of these technologies, and there are many, many tools available, from free software you can run on your machine to enterprise solutions for newsrooms and major firms.
Where to Start?
There are two ways to get started transcribing files:
Upload the audio or video file to a cloud service
Have your computer do it.
The first will give you better and faster results, with many extra benefits like AI summaries, translations, and easy editing. However, consider transcribing on your computer if you want or need to keep the sensitive data in the interview private. The output will be messier, but you don’t have to worry about data breaches or having your background interview used to train the next iteration of ChatGPT or whatever.
Let’s tackle the cloud services first. There’s a hefty list, which includes, in no particular order:
They all work more or less the same: You take the audio or video file, upload it and then wait for the transcript. Aside from Alice, which is more focused on privacy than most, all services offer some form of AI interaction, ranging from summaries to sentiment analysis to Otter’s ability to chat with an AI bot about your transcript.
These services typically offer a free trial before they ask you to pay up, but sadly, relieving the drudgery of transcribing is not inexpensive.
Otter.ai: $16.99/month for 1,200 transcription minutes of fewer than 90 minutes per transcript. Import or transcribe 10 audio or video files per month. Otter also offers a $30/month tier of 6,000 transcription minutes and up to four hours per transcript.
Trint: $80/month for 300 minutes and three translations, or $100/month for 1,200 transcription minutes and 20 translations
Rev: $14.99/month for 1,200 transcription minutes, 90 minutes per conversation; or $34.99/month for 6,000 transcription minutes, 4 hours per conversation
Descript: $19/month for 600 transcription minutes, $35/month for 1,800 minutes, or $50/month for 2,400 minutes.
Sonix: Pay as you go at $10/hour with discounts for bulk purchases.
Alice: Pay as you go, with packages ranging from $299.99 for 100+ transcribing hours ($2.99 an hour) to $9.99 for a single hour of transcription.
Of course, no automatic transcription is perfect, so you will need to review it for mistakes. Thankfully, except for Alice, these services also provide editing functions. Every service syncs the transcript with the audio, so if you need to skip all the small talk at the start of an interview and get to the meat of the transcript, just select the text you need, and the audio will jump to that point. They’re all pretty good about marking different speakers in the transcript so that you can rename them quickly, and even lousy audio is handled well, thanks to the predictive functions of the underlying tech.
The services are all trying to differentiate themselves, and it shows. Otter and Rev love meeting notes and want to summarize and schedule your meetings like a try-too-hard intern. Trint and Alice are very journalist-focused and great for interviews with several speakers. Sonix is fast and bare-bones but can transcribe over 50 languages. Descript is almost a full-featured video production site with templates for clips, subtitles, and the works. It's overkill if you just need plain transcripts.
When you’re done, these services all allow you to export the transcript in many formats, from PDFs and Word docs to subtitle files.
Homegrown Solutions
If you don’t wish to spend money on these online solutions, you probably already have some basic transcription tools on your smartphone. Google Recorder will create a live transcript using your phone’s built-in microphone. You can download the app from the Google Play store, and it comes preinstalled on Pixel phones.
Apple released iOS18 on Sept. 18, and it has similar functionality for iPhone users in the Notes and Voice Memo apps. You can even record a phone call (while obeying local laws for which parties in the conversation need to consent) and get both a transcript and summary with the new Apple Intelligence features, which only work on iPhone 15 Pros and newer phones.
Apple’s transcripts are pretty lean, however. You won’t get separated speakers or easy editing, but you can jump around the transcript with synced audio.
Finally, there’s OpenAI’s Whisper. OpenAI says its ASR system has been trained on 680,000 hours of audio in multiple languages and environments, giving it high accuracy. In my experience, it works pretty well and is accurate word-for-word, but it’s tricky to use directly. You can use an app like MacWhisper (if you’re on a Mac) or code your own command-line interface, but I’ve found it can’t yet distinguish between speakers or even paragraphs.
Finally, you’ll have to grab these apps’ raw transcripts and thoroughly scrub them in Word or some other text editor if you want typo-free, word-accurate text. That may not save you much time over manual transcription, however.
Tips for Using AI Transcription
There are a few tips for using any of these services and tools that can make transcribing more accurate and efficient for busy journalists.
Make sure you have clean audio. Of course, this isn’t always possible, given the nature of the news business, the clearer the audio, the more accurate the transcription. Try to use a directional or lapel mic for cleaner audio and avoid recording in echoey spaces or in high winds, among traffic, or other distractions.
While the file is being transcribed, start writing the story. I can usually remember the great quotes and distill what I can from my notes, but once the transcript is finished, I use search and highlight to double-check the quotes. Most reporters on deadline don’t need the whole back and forth in an interview, so get the quote, double-check it, and file your story.
Speaking of double-checking the audio, if you have time, give it a listen while reading the generated transcript. This can help you hear nuanced tone, inflection, and emphasis that might not come through in text. It will make editing the transcript a bit easier.
How to Choose the Right Transcription App
To test these services, I used an interview I made last year in Kyiv. The audio wasn’t great, the accents were thick, and a fair amount of Ukrainian was sprinkled into the conversation. All of the services generated accurate interviews with only slight variations. Not one got anything really wrong, but crosstalk was challenging for them. I would use Trint since I’m most familiar with it and find its editor intuitive. But it’s also the most expensive.
So, ultimately, try out the free trials and see which tools work best for you by balancing your specific needs, budget and privacy concerns. Check the transcripts to make sure the LLMs didn’t get anything important wrong. And accept the gratitude of interns all over the newsroom.
We conduct independent reviews of all the products and services we recommend. If you choose to make a purchase through one of the links on our site, we may earn a small commission at no additional cost to you.