Qualitative Coding: Description and Process
Table of Contents
- What is Qualitative Coding?
- Why Qualitative Coding?
- XML: From Marking-Up to Publishing.
- Schema Creation and Process.
- Qualitative Codebook.
Qualitative coding is a method used in the humanities and social science to analyze textual data, such as interviews. Researchers create a set of codes, which are themes and patterns that appear across their textual data; these codes usually reflect the particular heuristics or theories with which coders are working. Using these codes — or schema — researchers mark up the textual data and count how often particular themes appear across, make comparisons between, and analyze the textual data. Each researcher who chooses to incorporate qualitative coding may design their schema creation, coding process, and validation methods differently depending on their discipline, the theoretical framework they use, and their overall goals. Because I aim to be transparent with my research methods as well as provide a guide for any researchers hoping to develop their own research toolboxes, I will describe my qualitative coding process and why I chose to use XML.
What is Qualitative Coding?
In the above paragraph, I describe the overall process of qualitative coding, from data collection to analyzing the researchers’ codes. However, envisioning the qualitative coding process through just text alone can be difficult, so I will provide an example using a portion of the interviews from one of the fanfiction writers, specifically Kittya Cullen’s interview.
In order to qualitatively code, textual data is needed to actually build and apply your schema. Kittya Cullen’s original interview transcription is the textual data that I will use for this example. To provide some context, during the interview, I asked her to describe a choice she made in her fanfic in which she linked Asami’s trauma to her relationship with Korra. Kittya says:
I think I was both going for an understanding of how Asami herself is effected by this really terrible thing we see happen to Korra. Because at that point, everyone has seen someone who they think is all-powerful and infallible, invulnerable to an extent, be ... I don't want to say broken, but be injured in a really drastic way and them not being able to do anything much to help with her recovery in concrete ways.
In just a few short sentences, Kittya Cullen unravels a complex choice, and I wanted to capture the importance of this moment. However, this short paragraph is just a small percentage of the entire interview, and merely underlining or highlighting the quote would erase the complexities in Kittya’s words.
In its most basic sense, qualitative coding is a form or highlighting or underlining. However, instead of merely underlining text, I “annotated” text that captures a particular pattern or theme. The schema I created, which I describe below, incorporates language and theories form critical fan practices and rhetorical genre studies; visit the framework section of the CFT to read more. Using this schema, I began to mark up specific moments in Kittya Cullen’s interview that captured particular critical fan practices or RGS theories. If you are familiar with HTML, you may recognize the pointy-bracket structure. My codes are the bolded words between the pointy brackets, and the text being encoded are between the beginning <code> and end tags </code>.
<code writing-agency="reflection">I think I was both going for an understanding of how Asami herself is effected by this really terrible thing we see happen to Korra.</code> Because at that point, <code canon="canon-relation"> everyone has seen someone who they think is all-powerful and infallible, invulnerable to an extent, be ... I don't want to say broken, but be injured in a really drastic way and them not being able to do anything much to help with her recovery in concrete ways.</code>
With this document encoded, now the sentence “I think I was both going for an understanding of how Asami herself is effected by this really terrible thing we see happen to Korra” is labeled as “writing agency: reflection,” which is one of the codes used to mark up these interviews. I use the “writing agency: reflection” code to indicate when writers reflect about a specific choice they made while writing. I then use “canon: relation” to encode the last section of this excerpt. “Canon: relation” highlights when interviewees discuss how they identify with or relate to the canonical text. I continue this process throughout all the interviews based on the schema I created.
I conducted further analysis across the different interview, comparing and contrasting different interviewees’ perspectives. For example, I analyze all the times interviewees reflected on specific writing choices. What were the choices they made? Why might they have made these choices? What do these choices suggest about writing fanfiction and critical fan genre practices? I frame my arguments and answers to these questions by pulling information from the structured, qualitatively-coded interview documents. These results can be found in the “Interviews” portion of the CFT.
Why Qualitative Coding?
Qualitative coding, according to Johnny Saldaña (2015), is a heuristic, which basically means how knowledge is created, modeled, shaped, and circulated. Understanding qualitative coding as a heuristic emphasizes that the creation of schemas, the choices made when marking up the textual data, and the methods for analyzing the marked-up texts are all framed through the particular theoretical lens and discipline in which the researcher works. As I molded and created my schema to mark up the interviews with, I made deliberate decisions about thematic patterns I wanted to highlight in these interviews based on both the critical fans and rhetoric genre studies fields, which I will describe later in the “Schema Creation” section. The process of qualitative coding — from data collection, to schema creation, to coding the document, to analyzing the codes — models and makes transparent researchers’ understandings of data, and therefore unravels and reveals their knowledge.
Qualitative coding models both researchers’ knowledge and models the modeling process. In her description of “humanities computing,” Julia Flanders (2009) argues, “it is rather about modeling that knowledge and even in some cases about modeling the modeling process. It is an inquiry into how we know things and how we present them to ourselves for study, realized through a variety of tools which make the consequences of that inquiry palpable.” Flanders is focusing more on data modeling as found in the digital humanities, not qualitative coding. Data modeling in the digital humanities may include marking up archival documents to make them digitally readable — not just readable on computer screens, but readable by the computer — by using forms of XML.
Abbie Levesque DeCamp (2020) explicitly connects the process Flanders describes to qualitative coding, arguing it mirrors the types of data modeling described in the digital humanities: “The process of encoding forces an incredibly close reading - one must read and process all parts of a document, thinking deeply about each portion, sometimes down to the word, to accurately tag a document. That is, building in itself is a knowledge-making process.” Building schemas, using and creating qualitative coding tools, and coding textual documents are knowledge-making processes, processes that simultaneously construct and expound knowledge. As Flanders argues, this process “require[s] hat one distance oneself from one’s own representational strategies and turn them about in one’s hands like a complex and alien bauble.”
The process of qualitative coding, then, makes transparent “representational strategies,” forcing researchers to explicitly define their theoretical frameworks and how these frameworks are reflected within and by the data. In order to build my schema, I both had to explicitly choose the frameworks and disciplinary discourse I worked within as well as choose how I would qualitatively code the interviews.
XML: From Marking-Up to Publishing
One of the most popular software for qualitative coding is NVivo. NVivo allows researchers to create codes, mark-up multimodal files with these codes, take notes, and visualize results. However, I wanted to take a step further with my interviews — I wanted to publish the transcripts along with the qualitative codes. NVivo, unfortunately, does not have an easy output to translate encoded documents. XML, however, can both help researchers analyze and publish their data.
XML is a grammar for publishing large-scale digital data to then be transformed and/or analyzed. XML is not a software, unlike NVivo. XML is a set of grammatical rules that allow users to create their own language and vocabulary within these rules. Levesque DeCamp (2020) has an excellent explanation of XML and the steps of XML, which I will paraphrase here. XML is a tree-structured data format, unlike a comma separate variable data, which is read into spreadsheets. Above, I showed an excerpt from my qualitative coding and mentioned the “pointy-bracket structure” being similar to HTML. In case you are not as familiar with HTML, each HTML document has a beginning and end tag, and a bunch of layered tags, or “elements,” in between. Here is a very simplified sample of HTML:
<html> <body> <p>Text here</p> </body> </html>
HTML and XML have similar overall formats, except HTML already has an established language and can transform data. If I use the <p> element, HTML knows that everything between the first and end tag is a paragraph. When I open an HTML file on my browser, the “Text goes here” is transformed into a paragraph. XML, however, does not transform documents and merely marks them up. XML has the same structure with elements and textual data in between tags. It also has attribute and attribute values, which help to provide further information and differentiate data. In the example in my introduction paragraph, I use this tag: <code writing-agency="reflection">. The element is the code, while “writing-agency” is the attribute and “reflection” is the attribute value. The element is the code, while “writing-agency” is the attribute and “reflection” is the attribute value. Because XML is just a structure without a set vocabulary, researchers have to create their own XML schemas, choose their elements and attributes/attribute values, in order to markup documents and then transform them.
I used RelaxNG to create my own XML schema and then validate the transcribed interviews that I marked up. If I either did not properly follow XML’s grammar or used an incorrect word as an element, Oxygen (an XML editor) let me know immediately. Validation is XML’s version of spell check. After I finished transcribing, though, I needed to actually be able to transform and do something with the XML document, as HTML does. In fact, I wanted to transform my XML document into HTML so that I could publish my transcriptions and my codes. I used XSL Transformations (XSLT), which can transform an XML document into another XML document, an HTML document, or another output. With XSLT, I could both analyze the qualitatively coded interviews and publish the interviews.
While NVivo is a useful qualitative coding and analyzing software, especially counting how many codes appeared and with what other codes, XML provides a sustainable aspect that NVivo does not. XML files can be stored and opened on pretty much every computer. Plus, because the XML documents can be transformed into HTML documents, I can actually publish the transcriptions along with the codes, although the codes are now changed into HTML classes.
XML will not work for every researcher interested in qualitatively coding; there are also a ton of XML uses that lie outside of qualitative research. For example, the Text Encoding Initiative (TEI) is an XML schema specifically for publishing digital editions of texts. However, because of both my need for a tool that analyzes and publishes, XML was an easy choice.
Schema Creation and Process
Before I began with the qualitative coding process, I interviewed the fanfiction writers. I went into the interview with a set of questions and general expectations for how interviewees may answer these questions. Before I began the qualitative coding process, I was already thinking through the theoretical framework in which I was working. I wanted interviewees to share their writing practices and genre choices; the interview questions also encouraged them to think through the complicated relationship among their identities, their positionalities, the cultural texts they loved, and the fandom communities in which they participate. I also chose the interview subjects based on the fanfics they wrote and if their fanfics and tagging choices reflected critical fan practices. Because I use rhetorical genre studies and critical fan studies as theoretical frameworks, I was interested in themes around genre, uptake, ideology, power, and identity representation. Someone from a literacy perspective or a media studies scholar may use a different approach. My theoretical frameworks determine how I structure my interview questions, coding schema, and my analysis.
Rhetorical genre studies and fan studies shape the vocabulary of my XML schema so I could mark up patterns that tie back to my overall research. I listened to the interviews multiple times, created a codebook, and reassessed the schema until I settled with a final version. A codebook is the documentation for a schema which describes when particular codes will be used and why; I provide the codebook for my interviews below, as well as in the actual RelaxNG schema to remind myself as I code. Even as I continued qualitative coding, I revised the schema when it did not fit. For example, I originally had a “canon analysis” attribute value, which I wound up using for almost every interview participants’ answers; the overwhelming amount led to an unuseful output for my analysis and did not align with my research goals. Another example is when I finished coding and realized I needed to give each “code” element an individual identification number to make the HTML transformation easier.
Many of the attributes I chose reflect specific vocabulary terms found in RGS or fan studies, as well as terms that resonate with some of my findings. For example, one attribute is “uptakes” while the values are “critical uptakes,” “implicit-explicit-uptakes,” and more. Uptake is an RGS term, while the values are based on terms I create to define different fan uptakes. When I was coding, I realized the “uptake” attribute missed a value that captured other types of fan uptakes, which I named the “fan-practices-uptake.” The creation and process of the schema demonstrates how schemas are not stable and, as the eXtensible in XML suggests, are easy to revise. All qualitative coding research requires researchers to continue reimagining their structure as they become more intimate with their data and discover new patterns that emerge.
Besides the meta-elements marking up information about the interview, like who is speaking, the elements to mark up the interviewers’ words are only “code” or “power-identity.” I kept the element code general and created multiple attributes to allow for a specific line to have many attributes. Each attribute, too, can have multiple values. Here is an example from Aria’s transcribed interview:
<code fan-community="fan-politics" rgs-genre="identity-bending">I don't want to be like, "Ah, but I'm writing the version of this character that's a woman." I hate that. I know that that's permissible with the meta text but I don't want to be part of that.</code>
Within the “code” element are two attributes: fan-community and rgs-genre, and each of the attributes have values.
The “power-identity” element is nested within the “code” element, which means it can only be used within the “code” element. I wanted to emphasize in my schema how systems of power and positionality are entangled in writing practices. Here is another example from Aria in which the “power-identity” element is used multiple times within the parent “code” element. I underlined the “code” element, which has multiple attributes and attribute values, and bolded the “power-identity” elements, which are used three times to capture themes around cultural difference, class, and LGBTQ+ identities.
<code important="important-quote" writing-agency="reflection" rgs-uptake="critical-uptake"> <power-identity describe="cultural-difference"> Whereas Korra grows up in this institution, Asami grows up with a relatively politically progressive father.</power-identity> He's an abuser, but he's also a very, very politically progressive man. <power-identity describe="class">He's a radical, he tries to overthrow his government, make the world a more equitable and fair place. It's self-interested, he's a bourgeois radical, but I wouldn't have said that at the time.</power-identity></code>
When I use XSLT to transform this into an HTML document for purposes of both publishing and preserving Aria’s interview and the codes I marked up, the transformation looks like this:
Whereas Korra grows up in this institution, Asami grows up with a relatively politically progressive father.Systems of Power: cultural-difference He's an abuser, but he's also a very, very politically progressive man. He's a radical, he tries to overthrow his government, make the world a more equitable and fair place. It's self-interested, he's a bourgeois radical, but I wouldn't have said that at the time.Systems of Power: classCode: important-quote, critical-uptake, reflection
The entire bolded section is what appears between the two </code> tags, while the underlined portions are the lines that appear between the </power-identity> tags. These are now all clickable, making all the interviews actively engageable. Readers can determine if they want to simply read the transcriptions or see how I qualitatively coded the transcriptions.
The final section of this is the final qualitative codebook that I use to markup and analyze the interviews. The RelaxNG schema and the XSLT transformations are available on my GitHub page. Each code is an attribute, while the children codes are attribute values. I have also provided a definition for each code and how it is meant to be used. I use this codebook directly in the “Explore the Interviews” definitions, so the codebook can either be accessed here or there.
- Important: An overall interesting quote to highlight an important moment.
- Fan Politics: When fan authors mention fan community politics, including particular practices, theories, ideas, or tensions. These may not directly relate to politics, but may reveal underlying ideologies in fan communities.
- Uptake: These codes demonstrate different fanfic uptakes and define basic conventions of these genres based on the fan authors' responses. Uptakes are the ways in which a writer responds to one genre by engaging in another genre, following particular conventions or expectations for how to engage with the second genre. In terms of fan uptakes, these may be fanfics or fan art that respond to the original source material but follow genre expectations in the fan community.
- Critical Uptake: Demonstration of a critical uptake. A critical uptake occurs “when writers...resist harmful and exclusive cultural ideologies in their uptake” (Messina, 2019). These ideologies may be embedded in the canonical show or even fan genre conventions. This may be an uptake of the canonical text, an uptake of a fan genre convention, or an uptake of a prompt.
- canon compliant: When fan interviewees’ discuss a canon compliant uptake, which is when a fan celebrates and follows a particular event or character in the original source material (Messina, 2019). This may include writing a canonical relationship or continuing a canonical plot point.
- Canon Resistant: When fan authors discuss a canon resistant uptake, which is when a fan actively “resists both the implicit and explicit canonical choices made in the original cultural material” (Messina, 2019). This may include writing about a non-canonical relationship, identity bending, or reimagining an alternative plot point.
- Implicit-Explicit: When fan authors discuss an implicit-explicit uptake, which is when a fan “analyzes the subtext of the show and makes the subtext explicit in their fanfictions” (Messina, 2019). This may include writing from another characters’ perspective or exploring a romantic relationship that may be inferred.
- Fan Practices Uptake: When fan authors discuss a fan practice uptake, which is when a fan takes up fan genre practices in the fan community, such as engaging with fan genre conventions or implementing specific tagging practices.
- Genre: These codes examine different fanfic generic forms and define basic conventions of these generic forms based on the fan authors’ responses. Because Rhetorical Genre Studies defines genre as social actions that are recurring and re-acted in particular communities, then engaging in a genre means taking into account the conventions of that genre, choosing to resist or follow these conventions, the ideologies embedded within those conventions, who has say over these conventions, and how generic engagement helps community formation. These codes include angst, fluff, fix-it fics, and more.
- Angst: The fan author describes the "Angst" genre. Angst generic forms capture feelings of depression, anxiety, hurt/comfort, and other general forms of angst.
- Fluff: The fan author describes the “Fluff” genre. Fluff captures a happy, feel-good story, referred to as fluff because of the warm-and-fuzzy affect the writer attempts to capture. It almost always centers around romance and everyday moments of joy or cuteness.
- Vignette: Vignettes are intentionally shorter fanfics, such as 100 words to 1000 words, that capture an entire story in a small word count; 100 word fics are referred to as “drabbles.” Vignettes are more of a medium rather than a generic form — as they rely on a limited amount of written text to tell a story — so they may be multiple generic forms and conventions appearing across different vignettes.
- Fix-it: The fan author describes the “fix-it” genre. Fix-it is usually a canon-resistant uptake and "fixes" something that happened in the canon, like a particular writing choice. These fics usually come out of disappointment from the canonical narrative arc.
- Author’s Note: The fan author describes the “Author’s Note” genre. Authors’ notes appear at the beginning or end of fanfic chapters. The purpose of these notes are to provide content warnings, reflect on particular choices made in the fanfic, or speak back to readers or specific comments. For this code, I am also choosing to include supplemental, explanatory material written along with fanfics that may not just appear at the beginning or end of a fic.
- Identity-Bending: The fan author describes the “Identity-bending” genre. Identity-bending is when the fic writer transforms an aspect of a character's identity, such as their sexuality/race/gender, in their fic.
- Modern Setting: The fan author describes the “Modern Setting” genre. Modern settings, a subgenre of alternate universes, are when fans reimagine characters from their canonical universe into the fans’ modern-day universe, such as high school, college, or a coffee shop.
- Other: Mentions another type of fan genre, including roleplaying, fanart, etc.
- Writing Agency: This set of codes examine fan authors’ agency in their genre practices, writing choices, and uptakes. These codes include writing practices such as drafting, audience awareness, motivation, research, and more.
- Reflection: When fan authors reflect on a particular writing process or piece of their writing. This may be the subject explaining specific choices they made that are not about their audience or motivation for writing.
- Motivation: When fan authors mention their motivation for writing their fanfic or another piece. This may be their own motivation, but also their assumed motivations in the fan community.
- Audience: When fan authors demonstrate an awareness of their audience and explain how they were thinking about their audience.
- Research: When fan authors discuss forms of research they conducted while they were writing, including fan community research, reading about a theory, learning about language or history, and more.
- Drafting: When fan authors’ discuss the act of drafting, or early stages in the writing process such as brainstorming, outlining, etc.
- Revising: When fan authors discuss the act of revising, or later stages in the writing process.
- Reception: When fan authors directly mention reader feedback and how their writing was received.
- Canon Commentary: These codes examine how fan authors’ respond to or analyze the canonical material — TLOK or GOT. These codes include complimenting, relating to, or critiquing the canon.
- Canon Compliment: When fan interviewees’ compliment the canonical material, including its writing, ideologies, identity representation, and more.
- Canon Relation: When fan interviewees’ describes how they relate to the canonical material or moments they relate to.
- Canon Critique: When fan interviewees’ critique the canonical text, including critiquing the writing, ideologies, identity representation (or lack thereof), and more.
- Power and Identity: When fan authors mention aspects identities and positionalities that tie into systems of power and/or directly mention these systems of power, whether describing them or challenging them.
- LGBTQ+: When fan authors mention LGBTQ+ (lesbian, gay, bisexual, transgender) or any queer representation in fan communities, fanfiction, and the canonical materials.
- Homophobia/Transphobia: When fan authors mention of homophobia or transphobia, either homophobic/transphobic statements or references to homophobia/transphobia, in fan communities, fanfiction, and the canonical materials. Homophobia refers to discriminating against gay people (bi, lesbian, gay, pansexual), while transphobia refers to discriminating against transgender or gender-nonconforming folks.
- Heteronormativity: When fan authors mention heteronormativity, either heteronormative statements or references to heteronormativity in fan communities, fanfiction, and the canonical materials. Heteronormativity refers to the notion of heterosexuality and hetero-gender roles as the assumed norm, and how this assumption is embedded in everyday conversations, politics, and social interactions.
- Cultural Difference: When fan authors mention cultural differences and how these differences play out in fan communities, fanfiction, and the canonical materials. Cultural difference may be tied to racism/antiracism, language, cultural practices, religions, and more.
- Anti-racism: When fan authors mention resisting, challenging, or critiquing racism in fan communities, fanfiction, and the canonical materials. Antiracism refers specifically to deliberate actions. These actions may include forms of critical representations of diverse races and ethnicities, challenging racial inequality both ideologically and systemically, and more.
- Racism: When fan authors mention racism or racist statements in fan communities, fanfiction, and the canonical materials. Racism here specifically refers to power differentials between different races and how race is a culturally constructed way of identifying, marking, and reading bodies. Forms of racism include white supremacy, the erasure of diverse races/ethnicities, discriminatory behaviors towards a particular race (specifically people of color), and assuming whiteness as the norm.
- Sexism: When fan authors mention sexism or sexist statements in fan communities, fanfiction, and the canonical materials. Forms of sexism include patriarchal views of gender roles, power dynamics among genders, discrimination against women, and more.
- Feminism: When fan authors mention resisting, challenging, or critiquing sexism in fan communities, fanfiction, and the canonical materials. Feminism refers to anti-sexist representations of genders, challenging gender inequality both ideologically and systemically, and more.
- Class: When fan authors mention representations and understandings of socioeconomic class in fan communities, fanfiction, and the canonical materials. This includes mentions of classism, class imbalances, and forms of economic structures.
- Disability: When fan interviewees’ mention representations and understandings of disability — physical, emotional, or mental — in fan communities, fanfiction, and the canonical materials.
- Ableism: When fan interviewees’ mention ableism, either ableist statements or references to ableism, in fan communities, fanfiction, and the canonical materials. Abelism includes discrimination against people with disabilities, the erasure of people with disabilities, assuming able-bodied and neurotypical as the norm, etc.
- Other: When fan interviewees’ mention other forms of identity representations or systems of power, including around religion, age, and location.