How to use thematic coding in qualitative research

Every qualitative researcher eventually faces the same challenge: a mountain of interview transcripts, field notes, and open-ended survey responses with no clear path from raw data to meaningful findings. Thematic coding is the systematic process that bridges that gap — transforming unstructured qualitative data into organized patterns and actionable insights. Yet despite being one of the most widely used methods in qualitative research, many researchers struggle with how to actually do it well.

Whether you are conducting a small-scale interview study or managing a multi-site qualitative project with dozens of collaborators, this guide walks you through thematic coding from start to finish. You will learn the foundational framework developed by Braun and Clarke, practical steps for each phase, strategies for team-based coding, and how to keep your data organized as your research scales.

What is thematic coding?

Thematic coding is a method of qualitative data analysis that involves identifying, labeling, and organizing passages of text around recurring patterns — known as themes. Rather than simply summarizing what participants said, thematic coding requires you to interpret the meaning behind the data and group related findings into categories that address your research questions.

As Gibbs (2007) defines it, thematic coding allows you to "index the text into categories and therefore establish a framework of thematic ideas about it." The method applies across disciplines — from psychology and education to public health and organizational research — and works with interview transcripts, focus group recordings, survey responses, field notes, and even social media posts.

Thematic coding vs thematic analysis: what is the difference?

This is one of the most common points of confusion for qualitative researchers, so it is worth addressing directly.

Thematic coding refers specifically to the act of assigning labels (codes) to segments of data that represent a particular idea, concept, or pattern. It is the hands-on, line-by-line work of reading your data, identifying meaningful units, and tagging them with descriptive or interpretive codes.

Thematic analysis is the broader methodological framework that includes coding as one of its phases, but also encompasses familiarization with the data, theme development, theme review, and the final write-up. Braun and Clarke's (2006) six-phase framework — published in Qualitative Research in Psychology and now cited over 190,000 times — outlines this full process.

In practice, when researchers say they are "doing thematic coding," they typically mean they are conducting a full thematic analysis. This guide covers both the specific coding techniques and the complete analytical process.

When should you use thematic coding?

Thematic coding is particularly effective in these situations:

Exploratory research where you want to understand experiences, perceptions, or processes without testing a predefined hypothesis
Applied research where you need actionable insights — for example, evaluating a program, understanding user needs, or informing policy decisions
Multi-participant studies where you want to compare experiences across individuals, groups, or sites
Mixed-methods research where qualitative findings need to complement quantitative data
Systematic reviews that synthesize qualitative evidence using frameworks like PRISMA for scoping reviews

Thematic coding works with both inductive approaches (where codes emerge from the data) and deductive approaches (where codes are predetermined based on theory). This flexibility is a major reason it remains the most popular qualitative analysis method across the social sciences.

The six phases of thematic coding: a step-by-step guide

The most widely adopted framework for thematic coding comes from Virginia Braun and Victoria Clarke, first published in their landmark 2006 paper. Their six-phase model provides a structured yet flexible roadmap. Here is how to apply each phase in practice.

Phase 1: Familiarize yourself with the data

Before you code a single line, you need to know your data deeply. This means reading and re-reading every transcript, set of field notes, or response in your dataset. If you are working with audio or video recordings, transcribe them first — the act of transcription itself is a valuable familiarization exercise.

During this phase, take notes on your initial impressions. What topics come up repeatedly? What surprised you? What contradictions do you notice? These early observations are not codes yet, but they form the foundation for your coding work.

Practical tip: For a typical study with 10 one-hour semi-structured interviews, experienced qualitative researchers estimate 40 to 60 hours for coding alone, plus an additional 20 to 30 hours for theme development and quotation selection. Build this into your project timeline from the very start.

For research teams working on collaborative qualitative projects, keeping all transcripts, notes, and initial observations in a single shared workspace is critical. ScholarDock, a research project and reference management platform, allows teams to centralize qualitative data alongside source materials, project notes, and task assignments — so nothing gets lost between the familiarization phase and the final write-up.

Phase 2: Generate initial codes

With a solid understanding of your data, begin the systematic process of coding qualitative data. Go through each data item line by line or paragraph by paragraph, assigning short labels to segments that capture something meaningful in relation to your research questions.

Codes can take several forms:

Descriptive codes that summarize the surface content (e.g., "time pressure," "funding constraints")
Interpretive codes that capture an underlying meaning (e.g., "feeling unsupported," "institutional distrust")
In vivo codes that use the participant's own words (e.g., "drowning in paperwork")

Guidelines for effective initial coding:

Code inclusively. It is better to over-code at this stage than to miss something important. You can always consolidate later.
Code for individual meaning. Each coded segment should capture a single idea. If a passage contains multiple ideas, apply multiple codes.
Stay close to the data. Resist the urge to jump to themes too early. Let the codes represent what is actually in the data, not what you expect to find.
Keep a codebook. Document each code with a name, a brief description, and an example data extract. This is essential for consistency — especially when multiple researchers are coding the same dataset.

Phase 3: Search for themes

Once your dataset is fully coded, step back and look at your codes at a higher level. Begin grouping related codes into potential themes — broader patterns that capture something significant about your data in relation to the research questions.

This is where the analytical work intensifies. You are not just sorting codes into buckets. You are making interpretive decisions about which codes connect to each other and what larger story they tell together.

Create a thematic map — a visual representation of how your codes cluster into candidate themes and how those themes relate to each other. This can be as simple as sticky notes on a whiteboard, a mind map, or a digital diagram. The goal is to see the big picture.

Not every code will fit neatly into a theme. Some codes may form their own themes, some may become sub-themes within larger themes, and some may turn out to be irrelevant to your research questions. This is normal and expected.

Phase 4: Review your themes

Now pressure-test your candidate themes against the data. This phase has two levels:

Check themes against coded extracts. Re-read all the data segments coded under each theme. Do they form a coherent pattern? If a theme feels too broad, consider splitting it. If two themes overlap significantly, consider merging them.
Check themes against the full dataset. Re-read your entire dataset with your thematic framework in mind. Does the thematic map accurately represent the data as a whole? Are there important aspects of the data that your themes miss?

This is an iterative process. You may need to return to phase 2 and recode portions of your data, or restructure your thematic map entirely. Braun and Clarke emphasize that thematic analysis is not a linear process — it requires moving back and forth between phases as your understanding of the data deepens.

Phase 5: Define and name your themes

At this point, you should have a working set of themes you are confident in. Now refine each one by writing a detailed definition that captures:

What the theme is (and what it is not)
The scope of the theme — what aspects of the data it covers
The story the theme tells — how it contributes to answering your research questions

Give each theme a concise, informative name. Avoid single-word theme names (too vague) and overly long descriptions (too unwieldy). Aim for short phrases that are both descriptive and analytically meaningful — for example, "navigating institutional gatekeeping" rather than just "barriers."

If you have sub-themes, clearly define the relationship between the overarching theme and its components. A well-defined thematic structure makes the write-up phase significantly easier.

Phase 6: Write the report

The final phase transforms your thematic analysis into a coherent narrative. This is not just a summary of themes — it is an analytical account that tells the story of your data.

For each theme, present:

A clear description of what the theme captures
Representative data extracts (quotations) that illustrate the theme
Your analytical interpretation of what the data means in context

The best thematic analysis reports go beyond describing what participants said and explain why it matters. Connect your findings to the broader scholarly conversation, highlight what is new or surprising about your results, and address how disconfirming cases fit into your analysis.

Include enough methodological detail for readers to evaluate the rigor of your analysis: your coding approach (inductive, deductive, or hybrid), the number of iterations, how you handled disagreements in team coding, and any reflexive considerations.

Inductive vs deductive thematic coding

One of the first methodological decisions you will make is whether to take an inductive or deductive approach to coding — or a combination of both.

Inductive coding starts from the data itself. You read your transcripts without a predefined coding framework and let codes emerge organically from what you observe. This approach is ideal for exploratory research where you want the data to speak for itself and is closely associated with grounded theory traditions.

Deductive coding starts from theory. You develop a coding framework based on existing literature, theoretical models, or specific research questions, and apply those predefined codes to your data. This works well when you are testing a hypothesis or building on an established body of research.

Hybrid approaches combine both strategies — starting with a deductive framework while allowing additional inductive codes to emerge during analysis. Many experienced researchers find this the most practical approach, especially in applied research contexts where prior theory exists but the phenomena under study may reveal unexpected dimensions.

How to manage thematic coding across a research team

Thematic coding becomes significantly more complex when multiple researchers are involved. Inconsistent coding between team members — known as low inter-coder reliability — can undermine the credibility of your entire analysis. Here are strategies for maintaining quality and consistency in collaborative qualitative data analysis:

Develop a shared codebook early. Before anyone begins coding, agree on initial code definitions with clear inclusion and exclusion criteria. Document every code and update the codebook as new codes emerge.
Conduct calibration rounds. Have all team members independently code the same data extract, then compare results. Discuss disagreements and refine definitions until you reach acceptable agreement. Repeat this process at regular intervals.
Assign clear roles. In larger teams, designate a lead coder who reviews all coding decisions and maintains the master codebook. Other team members code independently, but the lead ensures consistency across the dataset.
Centralize your materials. Scattered files across email threads, personal drives, and different software tools is one of the biggest threats to coding consistency. Your team needs a shared workspace where transcripts, codebooks, analytical memos, and reference materials are all accessible in one place.

ScholarDock is built for exactly this kind of collaborative research workflow. By keeping your qualitative data, project documentation, source references, and team task assignments in one connected workspace, you reduce the risk of version conflicts, lost context, and fragmented codebooks that plague multi-researcher coding projects. When your codebook lives alongside your transcripts, your literature review, and your team's annotations, every coding decision can be traced back to its source — and every collaborator stays aligned.

Common mistakes in thematic coding and how to avoid them

Even experienced researchers fall into these traps during thematic coding. Recognizing them early will strengthen the rigor and credibility of your qualitative analysis.

Using interview questions as themes. Your interview guide is not a thematic framework. Themes should emerge from the patterns in participant responses, not mirror the structure of your questions.
Generating themes that are too broad. A theme like "challenges" tells you very little. Push for specificity — what kind of challenges, for whom, and under what conditions?
Treating coding as a one-pass process. Effective thematic coding is iterative. Plan to revisit and revise your codes at least two to three times as your understanding deepens.
Ignoring disconfirming data. Data that contradicts your themes actually strengthens your analysis when you address it directly. Include and discuss cases that do not fit neatly into your thematic framework.
Neglecting reflexivity. Your own perspectives, assumptions, and disciplinary training shape what you see in the data. Maintain a reflexive journal throughout the analysis to document how your positionality influences your coding decisions. This is a core component of methodological rigor in qualitative research.
Coding in isolation. Even in solo projects, peer debriefing — discussing your codes and themes with a colleague — helps catch blind spots and strengthens the trustworthiness of your findings.

Tools and software for thematic coding in qualitative research

While thematic coding can be done manually with printed transcripts and colored highlighters, most researchers today use digital tools to manage the volume and complexity of qualitative data. Here is how the main options compare:

Dedicated QDA software such as NVivo, MAXQDA, and ATLAS.ti offer powerful coding, query, and visualization features but come with steep learning curves and significant licensing costs
Spreadsheet-based approaches using Excel or Google Sheets are accessible and free but limited in functionality for large or complex datasets
Lightweight coding tools like Delve provide simpler interfaces focused specifically on the coding process, ideal for smaller projects
Research management platforms like ScholarDock let you organize your qualitative data alongside your full project workflow, reference library, and team collaboration — eliminating the friction of switching between disconnected tools for coding, literature review, and project management

The best tool is the one that fits your team's workflow and keeps your data organized throughout the entire research lifecycle. For teams managing multiple qualitative studies simultaneously, a platform that connects your coded data to your source library and project management structure is the most efficient path from raw transcripts to published findings. ScholarDock brings all of these elements together in a single connected workspace designed specifically for research teams.

Start coding with confidence

Thematic coding is both a science and a craft. The six-phase framework gives you a clear structure to follow, but the quality of your analysis ultimately depends on how deeply you engage with your data, how rigorously you test your interpretations, and how transparently you document your process.

Whether you are a PhD candidate conducting your first interview study or a principal investigator overseeing a multi-site qualitative project, the principles remain the same: immerse yourself in the data, code systematically, build themes that tell a meaningful story, and keep everything organized so your analysis is traceable and defensible.

If your research team is tired of scattered transcripts, inconsistent codebooks, and disconnected project files, ScholarDock brings your entire qualitative research workflow — data, sources, annotations, and collaborators — into one connected workspace. Start organizing your research the way it was meant to be organized.