What is Transcription?
Transcription is the process of listening to an audio or video recording and typing the spoken words into text that matches the recording as closely as possible.
The recording can be an interview, focus group, podcast, oral history, lecture, documentary etc. The transcription can be in a single language or from one language to another.
In this post we’ll talk about the different styles of transcription, steps involved in transcription, and how long it takes.
How detailed should a transcript be? Should it include every word on the recording? Should it be edited to remove grammatical errors? Should ambient sounds be included? All these elements depend on the transcription style.
There are three basic styles of transcription:
Intelligent Verbatim Transcription
The intelligent verbatim style of transcription (a.k.a. clean read or clean verbatim) involves transcription with detailed editing and some paraphrasing to create an easy-to-read transcript.
Original Text : …’Cause I mean…I think there are so many different needs er… different requirements within each one of those you know, those many and different segments that er… if you can hone in on each one of those segments um…
Intelligent Verbatim : Because I think there are so many different requirements within each one of those many and different segments that if you can hone in on each one of those segments…
First a transcriptionist types everything on the recording and proofreads it for accuracy. An editor then reads through the transcript to remove any false start (incomplete sentences) and repetitions, corrects grammatical errors, and arranges the text in short, readable paragraphs.
This is the preferred style of transcription for business-related recordings and is also called business transcription.
What’s included: Everything that is said on the recording with minor paraphrasing (if needed).
What’s not: Fillers (ums, uhs, you knows, etc.), false starts (incomplete sentences), repetitions, grammatical errors, sentence structure errors.
In verbatim transcription, every word on the recording is transcribed as is – excluding false starts and irrelevant repetitions.
Original: …’Cause I mean…I think there are so many different needs er… different requirements within each one of those you know, those many and different segments that er… if you can hone in on each one of those segments um…
Verbatim: : …’Cause I think there are so many different requirements within each one of those many and different segments that if you can hone in on each one of those segments…
The recording is first transcribed and then proofread by a transcriptionist. It is then taken through a second pass by a another transcriptionist to ensure that every detail is captured correctly. Very little editing is done and almost everything is transcribed as is.
Verbatim transcription is the most popular style of transcription for research.
What’s included: Every word on the recording including relevant repetitions and grammatical errors.
What’s not: False starts (incomplete sentences), repetitions.
True Verbatim Transcription
True Verbatim is the most detailed account of a recording, including every word, sound and non-verbal communication (like laughter and pauses).
Original: …’Cause I mean…I think there are so many different needs uh… different requirements within each one of those you know, those many and different segments that er… if you can hone in on each one of those segments um…
True Verbatim: …’Cause I mean…I think there are so many different needs er… different requirements within each one of those you know [pause], those many and different segments that er… if you can hone in on each one of those segments um… [shuffling papers].
First a transcriptionist types everything on the recording accurately and proofreads the transcript to ensure every little detail is captured. An editor then reads through the transcript to ensure all that detail doesn’t render the transcript unreadable (for e.g. adding punctuation and paragraphs and removing irrelevant detail).
True verbatim transcription is used for research and analysis when every little detail is required.
What’s included: Every word and sound on the recording along including non-verbal communication and ambient sounds.
What’s not: Excessive stutters and pauses that are irrelevant to the transcript and break the flow of reading.
Steps of Transcription
Transcription is complex process involving many steps starting from first analyzing the recording to creating the finished product.
The steps can differ slightly based on who is doing the transcription – you, a freelance transcriptionist, or a transcription service – but the basic process remains the same:
- Editing & Formatting
The first step is to analyze the complexity of the recording. What is the length of the recording? How many speakers are there? Do they strong accents? Is there technical terminology used? Is there background noise on the recording?
All these questions need to be answered in order to calculate how long the transcription will take.
The next step is the actual transcription of the recording which involves – playback, research, and typing. This is the most time-consuming part of the transcription process and can take anywhere between 4-9 hours for a single hour of recording, depending on the analysis in Step 1. More on this later.
It is impossible to get everything right in the first round. Even the most experienced transcriptionists can achieve a maximum of 94-95% accuracy in the first round of transcription. Proofreading is essential to achieve higher accuracy.
So in this step, a second transcriptionist listens to the recording and proofreads the transcript.
4. Editing & Formatting
The final step is to edit the transcript so that it makes sense while reading.
Editing can be very detailed or minor depending on the transcription style. For e.g., in intelligent verbatim transcription, the editing will include everything from correcting grammatical errors and paraphrasing to removing fillers and false starts. In true verbatim transcription on the other hand, editing would be limited to adding punctuation.
Formatting involves adding paragraphs, labels & headings, headers and footers, etc. This can vary depending on project specifications.
How long does it take?
How long does it take to transcribe a recording? This is a question asked even more frequently than ‘What is transcription?’
There are several factors that affect transcription time including the subject of the recording, number of speakers, audio quality, and typing speed.
Subject of the recording
As a rule of thumb, general topics can be transcribed faster than technical ones. For e.g. an interview with Barak Obama on his thoughts on healthcare will be transcribed faster than an interview with a scientist on the geology of Birmingham.
This is because the words used on the first interview can be transcribed from memory but the those on the second interview would require research in order to get the spelling and context right.
Number of speakers
Lectures and interviews can be transcribed faster than focus groups because the latter has multiple speakers. Why is that?
Have you ever noticed how it becomes easier to understand the lyrics of a song when you listen to it again and again? Of course that’s because you know the lyrics by heart but also because you understand how the singer pronounces certain words. That’s true for transcription as well.
While transcribing, the transcriptionist gradually becomes familiar with the accent and rate of speech of the speaker by listening and re-listening to the recording.
When the speaker repeats a word, the transcriptionist doesn’t have to strain her ears each time to understand it. She is also able to ‘tune in’ better to the speaker’s rate of speech so therefore understanding what they are saying becomes progressively easier.
In a group recording, speakers change quickly and the transcriptionist has to adjust to a different styles of speaking every minute or so. This naturally slows down the transcription process (particularly if the speakers are talking over each other).
That’s why the more the number of speakers on a recording, the longer it takes to transcribe.
This is easy to understand. Listen to these recordings:
It’s obvious that the noisy audio will take longer to transcribe because it’s so much harder to understand what is being said.
Noise is not the only factor that makes recordings difficult to understand. Voices can also be distorted by echo and low volume. That is why we advise our clients to use good voice recorders or professional webinar services.
While the exact time may vary, a simple recording (general content, 1-2 speakers, good audio quality) can take 3-4 hours.
A complex recording (technical content, 3-5 speakers, good audio quality) can take 5-6 hours.
With below-average audio quality, the transcription time can go up by 2-3 hours.
Note: These estimates are for the intelligent verbatim style of transcription. Transcription time for verbatim and true verbatim styles is higher.
So there is it. You now know what transcription is all about.
Do you have any other questions around transcription? Leave your question in comments below and we’ll answer as best as we can.