Feature

Striking the Right Balance Between Human and AI Transcription for Qualitative Research

By now we all know AI transcription of interviews is anything but flawless--esepcially when cultural references are critical to insights, as they can easily be missed by machines. But, if used properly, AI transcription can add efficiency to projects. In this article, multilinguistic researcher Jill Kushner Bishop outlines a decision-making framework to help researchers decide when to consider human or blended human and AI solutions to optimize resources and maximize the quality of insights.

 

By Jill Kushner Bishop, PhD
Founder & CEO
Multilingual Connections
Evanston, Illinois
jill@mlconnections.com

 

The shoemaker’s children go barefoot. The potter drinks from a broken jar. The blacksmith’s home has wooden knives. Yes, the transcription agency owner has 240 hours of largely un-transcribed fieldwork on high-8 cassettes in a box in her basement.

Let me explain that last one. Once upon a time, I was a graduate student in linguistic anthropology at UCLA. I was conducting ethnographic research in Jerusalem among speakers with a dying dialect of Spanish spoken by descendants of the Jews who were exiled from Spain in 1492. My days were filled with recording informal conversations, interviews, stories, and songs, and my evenings were filled with the sounds of the trilingual audio (Judeo-Spanish/Ladino, Hebrew, and English) and the scratching of my pencil on a yellow legal pad. After transcribing what was needed for my dissertation, I stashed the rest in a box with the best of intentions for “later.” Despite my plan to stay in academia, I returned to Chicago to work in user research and then launched Multilingual Connections, a language and culture consultancy in Chicago that supports global research through bilingual moderation, translation, and transcription.

Why Transcription Quality Matters in Qualitative Research

For my research, every conversational detail mattered. Discourse markers like “well” and “um” mattered. Pauses, overlapping speech, and language choice mattered. At the heart of it, I wanted to understand people’s stories, perspectives, and emotions, but I also wanted to understand how people came together to create meaning and connection through language. For me, it was essential to hear it and transcribe it directly. Qualitative researchers working for corporations and nonprofits don’t typically need that level of specificity, but accuracy? Absolutely. Transcripts allow researchers to dig into the details, identify patterns, and understand not just what was said but the full meaning and nuance of participants’ lived experiences. The importance of quality transcription cannot be overstated.

The transcription industry has changed significantly over the years. In recent decades, it’s been commonplace to send audio recordings of interviews and focus groups to transcription agencies, where skilled, experienced transcribers listen and transcribe every utterance. New agencies then started cropping up that offered lower-cost, crowd-sourced options, where audio would be broken up into segments and claimed by freelancers who didn’t necessarily have the same experience as professional transcribers.

Now we’re fully into the era of increasingly sophisticated AI tools like ASR (automated speech recognition) that can transcribe audio in minutes. While AI can provide speed, its accuracy varies based on numerous factors, including audio quality, subject matter, and language. Then there’s the issue of hallucinations: a recent article by AP produced in partnership with Pulitzer Center’s AI Accountability Network noted that machine learning experts and developers are continuing to see frequent hallucinations—made-up content, at times racial or violent, and often irrelevant, as seen in Example 1—which, if not detected, can, of course, impact your research.

What if your research brings you to multilingual communities? In the U.S., nearly 70 million people speak a language other than English at home,1 while globally, only 5 percent of people speak English as their first language.2 Whether you’re conducting research locally or internationally, there’s a good chance you’ll need to work in languages other than English. Depending on your project scope, this may involve translating surveys and open ends, or it may mean working with bilingual moderators who can connect with your participants, both linguistically and culturally. If you’re conducting multilingual focus groups or in-depth interviews, what should you do when it comes to transcribing and translating these conversations?

What are the tradeoffs between hiring a human transcriber or simply uploading them to an automatic transcription and translation platform? It’s an important question, and the answer is: it depends.

At Multilingual Connections, we’ve spent years refining the balance between experienced, skilled humans and AI transcription tools to deliver the highest quality outputs. But we understand that quality requirements vary, as do budgets and timeframes. In this article, I will outline key considerations and share best practices for researchers to decide when to use AI tools, when to involve human transcriptionists, and how to blend both for optimal results.

Different Languages, Different Outputs

Before diving into the AI-human discussion, let’s take a moment to address a few key considerations about multilingual transcription. Whether you’re a native speaker of your participants’ language or working with an interpreter or a bilingual moderator, you have a few options:

  • Monolingual Transcription: Your transcript will be in the same language as your source audio (e.g., transcribing French audio into French text).
  • Interpretive Transcription: Your transcript will be in English (e.g., transcribing a French interview directly into English).
  • Double Column: Your final transcript will have both the monolingual transcription output plus a side-by-side translation (e.g., French transcript plus English translation).

How do you know which option to choose? If you’re a native speaker and the only one analyzing transcripts, monolingual transcription may suffice. As the market for global research continues to grow, however, it’s likely that you’ll be working in languages that you don’t speak at a professional level (you may, in fact, be fielding a study in seven countries simultaneously). Having English-language transcripts allows you to understand what was said, as well as to collaborate with others who might be reviewing them—so interpretive transcription would be the best option. In some cases, you may decide that it’s important to have both the original language transcript for reference plus an English translation, so a double-­column format would make sense. Thinking this through from the get-go will help ensure efficiency and allow you to dig into the data as quickly as possible.

Key Considerations for Choosing the Right Approach

When deciding whether to use AI transcription, human transcription, or a blended approach, there are numerous factors to consider and evaluate relative to your goals, timing, and budget.

  1. The Importance of Nuance and Context
    In many areas of qualitative research, nuance and context are critical, and these are the elements that often carry the most important insights.
    • Pain Point: AI transcription tools may get many of the words right, but they struggle with interpreting tone and emotion and may miss culturally specific expressions or intentions that a human transcriber would catch.
    • Recommendation: For projects where cultural or emotional nuances are central, human transcription is invaluable. AI can be a time-saver for more straightforward tasks, but for high-stakes research, human review ensures accuracy.
  1. Complexity of the Audio
    Qualitative research doesn’t always happen in perfect environments, yet AI transcription tools work best with clear audio.
    • Pain Point: Researchers often deal with multi-speaker conversations, overlap, ambient noise, connection issues, or code-switching between languages, all of which can lead to inaccurate or incomplete transcripts.
    • Recommendation: For clean, clear audio—a straightforward IDI, for example—AI transcription can offer a quick, cost-effective solution. For more complex recordings, like focus groups involving multiple speakers or mixed languages, human transcription is necessary to ensure clarity and completeness.
  1. Time and Budget Constraints
    Researchers often work under tight deadlines and limited
    budgets, and AI transcription can offer a major advantage
    in terms of speed.
    • Pain Point: AI can generate a rough transcript within minutes, which can be useful for getting a quick overview or pulling out initial insights. However, the initial time saved may be offset if the AI transcript requires significant human editing.
    • Recommendation: Use AI transcription as a first pass when time and budget are constrained, but follow up with human review to ensure that the final product is reliable. Blended workflows—where AI provides the draft and human transcribers refine it—can offer the best balance of speed and quality.
  1. Confidentiality and Data Sensitivity
    In research involving sensitive topics—such as healthcare,
    social justice, or personal identity—data security is paramount.
    • Pain Point: Not all AI transcription platforms offer data protection (especially free versions), and researchers need to be mindful of confidentiality and legal risks, especially when working with participant data.
    • Recommendation: For projects involving sensitive information, use transcription tools or services that prioritize data security and confidentiality. In many cases, the safest option is to work with professional transcription agencies with strong data privacy protocols and trusted transcriptionists who are bound by NDAs.
  1. Language Complexity and Translation Needs
    For monolingual transcription, AI-powered ASR tools have
    made significant strides, especially with clear, high-quality
    audio in high-resource languages like English or Spanish.
    With lower resource languages and dialect variation, quality
    can vary greatly.
    • Pain Point: Many languages have significant regional variations or insufficient data sets, and as a result, AI output is poor. When you’re working from one language directly into another—as you do with interpretive transcription—you introduce an additional layer of complexity that significantly multiplies the potential for errors, particularly when the conversation involves cultural nuance or emotional depth.
    • Recommendation: For projects requiring translation as well as transcription, human expertise is essential. AI tools might offer a rough draft, but human transcribers fluent in both languages and familiar with both linguistic and cultural nuance are needed to ensure accurate, culturally informed translation. For example, as a native English-speaking Chicagoan in my 50s, would I trust myself to transcribe audio with young speakers of multicultural London English? No chance!
Practical Toolkit: How to Approach Transcription Based on Your Needs

After considering the factors outlined above, let’s now move to some real-world scenarios to give you an idea of how they may apply to your research.

Scenario 1: You’re conducting usability testing for a mobile app with English-speaking users, and the audio is clear. You need quick transcripts to identify key feedback.

  • Approach: AI transcription can provide an initial
    draft quickly, allowing you to focus on key points.
    A light human review can catch any minor errors.

Scenario 2: You’re leading a focus group in which participants are switching between English and Arabic, and the discussion involves complex cultural references.

  • Approach: A blended approach is recommended. AI can help with the initial transcription, but a human transcriber familiar with both languages and cultural contexts is necessary to ensure accurate translation and insights capture.

Scenario 3: You’re conducting ethnographic research on healthcare experiences among marginalized communities, and participants are sharing deeply personal stories that overlap with one another and involve informal language and code-switching.

  • Approach: In many communities, it’s common to speak
    multiple languages, and speakers may default to one language for some topics and another for more personal and
    emotional topics. Human transcriptionists attuned to these issues, the sensitive nature of the data, and the need for cultural understanding will help ensure no meaning is lost in the transcription process.

Scenario 4: You’re analyzing customer service call recordings for a global company, with participants speaking in English but with strong regional accents (e.g., Scottish or Indian English).

  • Approach: AI transcription may struggle with nonstandard accents and regional idioms, so a blended approach is
    recommended. Use AI for speed but have human transcriptionists familiar with regional accents review and correct the output for clarity and accuracy.

Conclusion: Finding the Right Balance

As a qualitative researcher, rarely do you take a one-size-fits-all approach to your research. Similarly, your transcription needs will vary based on the language, scope, complexity, and context of each project. While I’ve highlighted scenarios and suggested approaches above, you’re certain to encounter dozens of unique situations where there’s no perfect answer. AI transcription offers speed and cost savings, but it has many limitations, especially when nuance, cultural sensitivity, and complex audio are involved. As noted, there are times when a human transcription process from start to finish seems appropriate. Then there are the many situations where a combination of the efficiency of AI and the insight of humans seems to be the best choice.

What about the future of transcription? AI has radically changed the way so many of us work and in a short amount of time. As technology continues to advance, it’s possible that the role of expert humans will be even smaller in the future. But for where we are today, these expert humans still play an essential role in ensuring linguistic and cultural nuance. It’s important to focus on blending human expertise with technology to provide transcription solutions tailored to your unique needs. Whether you’re conducting research in the U.S. or across the globe, our expert team can help you navigate the complexities of transcription and translation to get the most accurate and insightful results.

Resources:

  1. U.S. Census Bureau, American Community Survey Press Release, 2023
  2. Crystal, David. English as a Global Language,
    Cambridge University Press, 2003