Interview Transcription Check-list
Here’s a check-list of things to provide the transcription company when sending an interview for transcription:
- List of uncommon words used on the recording.
- Names of the speakers.
- Transcription style – verbatim or clean read.
- Editing instructions.
- Formatting instructions.
- Preferred file format for the transcript (.doc, .docx, .txt, etc.).
- Spelling style (UK or US).
- Time-coding and Time-stamping instructions.
It is possible to get a transcript that doesn’t need hours of re-working and editing – just give the right instructions!
Factors that Affect Transcription Rates
Transcription rates are based on Audio Minutes.
1 Audio Minute = 60 seconds of audio/video recording
This means that no matter how long it takes to transcribe a recording, you would still pay by length of the recording and not the actual hours worked.

Here are a few factors that affect transcription rates:
Audio Quality
Professionally recorded high quality audio is the easiest to transcribe and therefore costs the least.
Audio with background noise or recording issues is difficult to understand and may require several rounds of proofreading to ensure accuracy; such audio therefore costs more.
Quick Tip: For those who record regularly, it may be a good idea to invest in a DVR or some other form of recording equipment. This will help generate high quality audio that will bring down transcription costs.
Accents
Recordings involving heavy accents generally require a native speaker of the accent (or a very experienced transcriptionist) to transcribe them accurately. For this reason, accented audio can costs considerably more depending on the accents involved.
The Australian, Irish and Scottish accents are considered to be some of the most difficult ones to transcribe and cost the most. More neutralized accents such as the US accent are easier to understand and cost lesser.
Quick Tip: When recording with speakers who have heavy accents, it helps to lay down some ground rules to ensure that the recording is clearly understandable both for listeners as well as transcribers.
Subject of the Recording
A recording on a technical subject like finance, real estate or healthcare would likely have terms and phrases that are not commonly known. These terms need to be researched to ensure correct spelling and context. For this reason, recordings on technical subjects take longer to transcribe and cost more.
Quick Tip: To improve the accuracy of transcripts on technical subjects, it is advisable to provide a list of commonly used terms to the transcription company. This will reduce transcription time and improve accuracy.
Number of Speakers
A single speaker talking at a steady rate of speech is easy to understand and transcribe. However, if a recording involves multiple speakers talking simultaneously (at a dinner meeting for example), it becomes difficult to catch all that is being said. Identifying each speaker by name also becomes a challenge in such cases. Multi-speaker audio therefore costs more to get transcribed and yet does not always meet accuracy requirements.
Quick Tip: When planning to record a multi-speaker audio, spend some time choosing a good microphone and DVR. Also instruct the participants to speak one at a time and mention their names before speaking.
Proofreading
Transcription is a 3-step process:
- Preparing a draft
- Proofreading
- Editing & formatting
The draft is an 85-90% accurate transcript prepared during the first round of listening to the audio. This draft is then taken through multiple rounds of proofreading to fill in the blanks and correct errors. The cost of transcription goes up with every round of proofreading.
Quick Tip: If you intend proofing and editing the transcript yourself, you can ask your transcription company to simply provide a draft, which costs much lesser than a fully proofed and edited transcript. Just make sure they duly time stamp all the blanks or put periodic time codes in the draft that will help you while editing.
Editing & Formatting
The final step in transcription is editing the text for grammar, punctuation, sentence structure etc. along with formatting the layout for easy reading. This step generally adds an additional 1-2 hours to the total transcription time (depending on the length of the transcript). Transcription companies generally factor in the cost for this work in audio hourly rate rather than charging for it separately.
Quick Tip: Editing & formatting costs can be saved if the transcript is for personal use and doesn’t need to look perfect.
All factors considered, a single hour of clearly recorded audio/video can take 3-4 hours to transcribe, 1-2 hours to proofread, and another 1 hour to edit & format. In effect, one audio hour can take up to 4-7 hours of work. This means that if you’re paying $60.00 for the transcription of a one-hour recording, you’re actual paying $1.00 per hour.
A Note on Economies
It’s a known fact that hourly wages differ widely in different economies of the world and naturally that affects transcription services as well. Transcription rates can be anywhere from $3.00 to $0.25 per audio minute depending on the location of the transcription company and the experience level of the transcriptionists. The quality of transcripts can also be drastically different depending on how much you pay.
Quick Tip: While hunting for the best deal in the global market, it may be a good idea to ask for a quote from several companies, sample their work and check credentials before making a final decision.
4 Rules of Verbatim Transcription
Verbatim transcription is the art of converting spoken word into text such that a message is captured exactly the way it has been spoken.
This requires a keen ear and attention to detail. Verbatim transcripts cannot be created by mindlessly listening and typing. One has to pay close attention to every sound, tone, word and make intelligent use of punctuation to convey the correct message.
Here are 4 important rules of verbatim transcription:
1. Capture EVERY word (don’t paraphrase)
Many transcriptionists have the habit of paraphrasing statements to convey the general idea of what is being said rather than typing out the exact words. This process is called clean read transcription is much preferred in business transcription because of the easy-to-read transcripts it produces. But it’s not very popular amongst researchers and analysts who need to know exactly what was said. Here are a couple of examples to illustrate the difference between the two styles -
Paraphrased sentence: “I was screaming for my mother and she was maybe 30 yards away in the house, she couldn’t have even heard me even if she was outside.”
Verbatim sentence: “And I’m screaming. You know, I’m screaming. I’m screaming for my mother. And She was uh maybe 30 yards away in the house. I mean she could have never heard me. Even if she was outside she probably wouldn’t have heard me.”
While the meaning conveyed in both sentences is the same, the emotion is far more pronounced in the second one. Depending on what the transcript is going to be used for, this may make a world of difference. So in verbatim transcription, it’s important to type each and every word that is said.
2. Don’t leave out non-verbal communication
Communication has a lot of components other than words – such as laughter, pauses, hand gestures, etc. Verbatim transcription captures all these in order to give a true account of what’s being said.
For example,
K: What does you mother think?
N: .. Not much. . She agrees with me . yeah.
K: Really?! [Laughs] Are you sure?!
[N laughs]
Here are a few more rules for transcribing non-verbal communication:
When two speakers speak at the same time, indicate this with /, as in:
N: Yes, I have been /living here
K: /Oh you have?
N: for three years.
I.e. ‘living here’ and ‘oh you have’ were said at the same time and N continued on his sentence without stopping.
Use = when two lines come directly after one another without a gap e.g.
K: Did you like her? =
N: = Yes!
That is a very fast reply.
For short pauses add a full stop, each one representing a second. For pauses longer than 4 seconds, put time in brackets and italicised e.g. [6 second pause]
3. Catch those fillers and false starts
Fillers are the ums, ahs, you knows, that are often used by speakers to buy time to think.
False starts are sentences that are started but never completed, such as:
“I would say that’s not such a… I mean that may not be… it’best to check with an expert before proceeding in such matters.”
Fillers and false starts may break the flow of speaking but often provide insights into the thinking process of a speaker. The process of verbatim transcription therefore includes these components in the transcript rather than editing them out.
4. Note external sounds
Qualitative research and even market research often requires knowing what’s happening in the surroundings while the subject or interviewee is speaking. Some examples of external sounds can be sounds of doors opening, people walking in, a side conversation between fellow participants, etc. These sounds/events should be duly noted on the transcript in brackets and with time stamps if required.
The main idea of verbatim transcription is to capture both the ‘what’ and ‘how’ of speech. Not everyone requires the same level of detail – for example, someone may need the non-verbal communication transcribed but may not want any external sounds/events noted on the transcript. It’s always a good idea to thoroughly discuss the specific requirements with your client before beginning a transcription project so that you know exactly what to transcribe and what to leave out.
Outsourcing Transcription of Research Interviews
When planning to outsource the transcription of your research
interviews it may be a good idea to invest some time in writing out a clear set of instructions for your transcriptionist because research transcription is quite different from regular transcription. Here are a few points to get you started -
Identify the Transcription Style
In research interviews the HOW of what’s being said is almost as important as the WHAT. For this reason the preferred style of transcription for these interviews is Verbatim Transcription. This style involves typing out everything that’s recorded on the interview including -
- Fillers (the ums, ahs, you knows, etc.)
- False starts, i.e. sentences that are started but then changed to something else (For e.g., “I think that would be…I’d say that’s something important”).
- Repeated words/phrases (e.g., “in that case, in that case the methodology would differ”)
- Non-verbal communication (such as laughter, long pauses, coughing, etc.)
- Other observations from the recording such as side conversations, over talking, interruptions, people walking in or out, etc.
The alternative style of transcription is Clean Read. This style is not used in research transcription as it involves editing out some part of the text.
Provide Formatting Guidelines
If everything on a recording is typed out in a single chunk of text, it would be impossible to decipher anything of value from the transcript. And you as the end-user of the transcript would end up spending hours trying to figure out where one speaker stopped and the next one began speaking! To avoid this, provide some basic guidelines to your transcriptionist, such as -
Paragraphing
The entire transcript should be broken down into small paragraphs for ease of reading. This of course doesn’t mean unnecessarily changing paragraphs even if it’s contextually incorrect – but common sense should be used to break down long monologues.
Speaker Identification
The initials/first name/full name should be mentioned each time the speaker changes. This can also be marked in bold for easy identification. The speech of the interviewer and interviewee/s can also be differentiated by using italicized text for one of them.
Using italics or brackets for emphasis
Italics can be used to mark text that is spoken with particular emphasis. Some people prefer to use [brackets] or text in bold for this purpose.
Time stamping
A 100% accurate transcript is a myth. No matter how skilled a transcriptionist, there would always be some words that are unclear or inaudible on a recording. These should be time stamped [hh:mm] or [hh:mm:ss] and highlighted for easy identification while editing. You can then quickly play just that portion of the recording and make the necessary corrections when reviewing the transcript.
Other customized formatting
Some people like to add customized formatting such as-
‘…’ for short pauses
[duration] for long pauses
/ for one speaker finishing of a sentence started by another, etc.
These instructions should be clearly documented and shared with the transcriptionist before beginning transcription.
Test Accuracy
Accuracy of course is crucial in research transcription (a minimum of98.5%). To find out how accurate your transcripts are going to be, it’s a good idea to ask your transcriptionist to complete one interview as a test. Most transcription companies charge for samples and you may have to invest a small amount in testing the skills of several service providers. But in the long run this would pay off in terms of both time and money spent on getting the transcripts proofed by someone else.
Research Prices
Most PhD students and Research Associates work on small budgets. The expenses for transcription are either paid out of their own pockets or through limited-amount grants from their college/university/institute. Keeping this in mind, most transcription companies offer discounts on research transcription that can be availed by provide a copy of a college ID card or other documents that prove that you’re a student.
That said, research transcription does require considerably more effort as compared to other types of transcription (such as business transcription). As such transcription rates for research interviews normally range between $25.00 – $55.00 per audio hour. The cost can vary depending on several factors (including where you’re outsourcing to). It’s a good idea to ask for a quote from several service providers to compare prices.
Discuss Confidentiality
Be sure to ask the service provider to provide a signed NDA that clearly states that the material (interviews as well as transcripts) will be kept confidential and deleted at the end of the project. Most good transcription companies provide these options proactively, but it’s still a good idea to outline (and document) your requirements before beginning work.
Review Often
If you choose the services of a new transcription company (or one you’re hiring for the first time), it would be a good idea to review the transcripts periodically rather than waiting till all the interviews have been transcribed.
Did you find this post useful? Leave a comment or ask a question!
Transcription Turnaround Time
Transcription turnaround time depends on several factors –
Duration of the audio or video recording
A clear 60-minute recording can take anywhere between 2-4 hours to transcribe and another 1-2 hours to proof. First the transcriptionist types out the entire recording without rewinding any part. Then she proofs the transcript by listening to the entire recording once again and simultaneously reading the text. While doing this she corrects errors and fills in the blanks left out in the first round. If there are too many blanks or mistakes in the text, a second round of proofreading may be needed, consequently increasing the transcription turnaround time.
Number of speakers
Speaker identification in recordings with multiple speakers can take time (depending on the rate of speech of the speakers and how disciplined the conversation is).
The transcriptionist has to carefully identify of each speaker’s voice and mark their name correctly each time they speak on a recording. For this the transcriptionist has to go slow and may have to re-listen to parts of the recording more than once.
Accents
Transcribing recordings in strong accents – such Irish or Australian – not only requires an understanding of the accents but also colloquialisms. A transcriptionist must have experience in working with difficult accents and has to carefully listen/re-listen to the recording to ensure accuracy.
Accents almost invariably add to the turnaround time for transcription.
Technical Content
Transcribing interviews, seminars, and other recordings on technical subjects requires research. A medical interview may involve medical terminology or a board meeting may involve financial terms that are not commonly known and must be researched.
Generally a transcriptionist would mark these terms as blanks while creating the first draft. Then he’ll go back and research each term to fill in the blanks at the time of proofreading. This obviously adds to the transcription turnaround time.
Audio Quality
This is a big one. A clear recording, free of background noise and recording issues is the easiest to transcribe. But many audio files (especially those created outdoors or created using inadequate equipment) are not well recorded. For example, an interview conducted over dinner without using lavaliere microphones may have the sound of cutlery, background music and side conversations recorded along with the actual voices of the participants. This makes transcription difficult and naturally slows down the process.
Transcription turnaround time is also impacted if the volume of the speakers is not high enough. It is always advisable to use microphones and (when possible) conduct the recording in a quiet room to minimize noise.
Transcription Style
There are 2 main styles of transcription used by most people. Verbatim Transcription and Clean Read Transcription. Verbatim transcription normally takes longer.
Editing
Editing a transcript for grammar, punctuation, sentence structure, etc. requires an editor to go through the entire text. This adds to the time taken to produce a finished product.
Formatting
Adding formatting such as headings, subheadings, italicizing or highlighting text, paragraphing, adding margins, etc. also adds to the transcription turnaround time.
In essence, transcription turnaround time depends on the recording quality, complexity of the subject, and what you need the final document to look like. The best way to find out how long your recording will take is to send a sample to your transcriptionist and ask them to give you an estimate.
