8 Content Analysis Introduction
Content analysis is the study of cultural artifacts—materials humans have produced that have meaning. Suppose you want to study how many movie characters smoked in 1930s movies and compare that to how many movie characters smoke today. Or any of the following questions:
- How do television newscasts portray other countries (England versus India, for example)?
- Do female characters in strategy video games have unrealistic bodies (i.e. overdeveloped breasts and overly narrow waists)?
- How do the lyrics in video rap music treat drinking alcohol and using drugs?
To answer each of the above questions, the researcher should look at the content of (respectively) television news coverage of England and India, female characters in strategy video games, and rap music lyrics—all texts that some person or group has produced.
While there are different definitions of content analysis (see Definition Box 8.1), they all share the basic focus on analyzing products that humans have produced, rather than asking people what they think or feel about human-produced material. This particular distinction is absolute. Content analysis is always looking at content, not perceptions of content. Content analysis cannot determine impact—how the text will affect you (or any other reader), or what you or any other reader will think or feel about the text.
Definition Box 8.1: Definition of Content Analysis
Berelson (1952, pg. 18): Content analysis is a research technique for the objective, systematic, and quantitative description of the manifest content of communication.[1]
Krippendorff (1980, p. 21): Content analysis is a research technique for making replicable and valid inferences from data to their context.[2]
Neuendorf (2002, p. 10): Content analysis is a summarizing, quantitative analysis of messages that relies on the scientific method (including attention to objectivity, a priori design, reliability, validity, generalizability, replicability, and hypothesis testing) and is not limited as to the types of variables that may be measured or the context in which the messages are created or presented.[3]
Content analysis is further divided into qualitative and quantitative analysis. Qualitative is generally focused on developing theory (inductions) and not just testing (deduction); requires skilled social researchers who have developed their analysis through careful (also sometimes called “deep”) reading of the text; and often is looking for the hidden meaning of the text, rather than the open or surface reading of the text. Commonly, the researcher makes assumptions about how the message was produced and how audiences read the content. In other words, the researcher is willing to assume that they can tell (from reading the text) what the producer intends to say and what the audience takes away from the message.
Quantitative context analysis, which is the primary focus of this chapter, concentrates on manifest content. The quantitative researcher is concerned with understanding what everyone sees—the surface meaning. It is extremely important in quantitative content analysis to count the same thing in the same way.
To roughly illustrate the difference between the two approaches, a qualitative researcher looking at the four pictures of wedding couples in Figures 8.1A-D could argue that modern western portrayal of a wedding couple emphasizes both heteronormativity and the “couple” as the most fundamentally important unit to the exclusion of the group or the community, a theoretical argument that uncovers the hidden assumptions about what is important in weddings—the couples. The emphasis on couples then becomes a fundamental building block for a heteronormative view of relationships that excludes extended family and community. Usually, the researcher will point out specific illustrative examples of “coupledom,” but will seldom review or summarize their entire database. The reader then assumes, but does not know, that the illustrative examples are typical of the entire database.
A quantitative researcher, on the other hand, would first analyze the wedding picture through coding specifics of the picture, such as gender (number of males and number of females), setting (natural, built, combination), and focus of attention (whether the couples are more focused on the group or on each other). Quantitative researchers would count the number of females (three pictures have one female, and one picture has seven) and males (three pictures have one male, and one picture has seven). Two pictures have natural (outdoor) settings, and two have indoor settings, and the focus of the center is the male-female couple. From those findings, the researchers could develop a theory that the pictures emphasize heteronormativity and that the couple was the most fundamental unit for this type of picture, but the findings themselves would be the quantitative description derived from the coding scheme.




That is, to do a quantitative content analysis, researchers select a sample of what they want to study (bits of TV shows, videos, games, film), develop a set of counting rules based on what they are going to look for (see Example 8.1), and develop a numerical description of the data.
Example 8.1
What can you study with content analysis?
Virtually any kind of cultural artifact, including: Maps, movies, videos, video games, dolls, toys, rap videos, magazine advertisements, YouTube videos, records, signs, television advertisements, maps, pictures, cartoons, romance novels, pop-up ads, reality shows, crime dramas, plays, mystery stories, works of art, postcards, country music lyrics, cereal package covers, pesticide labels, and many others.
Since content analysis involves meaning, part of the difficulty in doing content analysis is the degree to which audience members (those who receive the message) uniformly interpret the meaning of a text or picture. Quantitative researchers are extremely concerned that they minimize any ambiguity inherent in alternative interpretations of what the audience is seeing. That is, they want to make sure that they can precisely define what is counted.
“Counting rules” simply refers to how researchers will instruct coders to count. For example, let’s say you were doing a study of how many statues in Norway were trolls (see Figure 8.2). Your first task would be to figure out what characteristics “counted” as “troll” versus something else—an elf, for example. You would develop a series of coding rules that distinguish “trolls” from “elves.” Do trolls have big noses (yes), do elves (no). Do trolls have large feet (yes), do elves (no), wide mouths (trolls yes/elves no), potbellies (trolls yes, elves no), and tufted hair (trolls yes/elves no). Given these counting rules, then, would you code the statue (see Figure 8.2) as a troll or an elf?
Presumably, the vast majority would agree that (given the coding rules above) Figure 8.2 shows a troll, not an elf.

Coding rules for personal judgments are far harder to define. For example, it is much more difficult to say whether the troll is cute or fun than it is to code the troll as having a large or a small nose.
Secondly, the coding rules themselves may not adequately capture the essence of what needs to be coded. Do the coding rules for “troll” as given above really distinguish a troll versus a non-troll, or does the definition come too close to the edges of what can be considered “trolldom”?
Finally, what calling someone a troll means is deeply culturally embedded. A tourist from the United States in Norway, with a relatively limited knowledge of Norse folk mythology and Norwegian history, will probably view trolls differently than the more complex view of a Norwegian who understands the trolls’ place in Norwegian history and the culturally sensitive way that Norwegians have been disrespected as “trolls.”[4]
As a reader, what is important is to figure out how the researcher instructed coders to code and whether those coding instructions are sufficient for the purpose at hand. At one end, researchers can be very specific about their codebook, as in the following example on newspaper coverage of climate change (see Table 8.1).[5]
Codebook Used for the Analysis of Sampled Newspaper Articles | ||
Code | Subcode | Description |
Climate | Explicit Reference | Must include one or more of the following phrases: climate change, global warming, global change, changing climate, or warming climate |
Implicit Reference | Does not include any of the explicit phrases listed above, but does discuss the changing frequency and/or intensity of hazardous weather, and/or changing temperatures | |
Climate Perspectives | Denial | Any reference to climate change denial perspectives, such as media figures claiming climate change is a government conspiracy, or that climate change does not exist |
Consensus | Explicit statements that climate change is happening, that climate change is a fact, or that a consensus of scientists agree that it is happening | |
Spatial Proximity Cues | Proximal | The effects of climate change are nearby to a reader in the U.S.; references to climate change impacts that occur in the continental U.S. |
Distal | The effects of climate change are far away from a reader in the U.S.; references to climate change impacts that occur outside the continental U.S. | |
Temporal Proximity Cues | Proximal | The effects of climate change are happening now; includes any present- and past-tense descriptions of climate change |
Distal | The effects of climate change will happen in the future |
In this example, researchers have explicitly outlined the code for each climate change variable. Was climate change mentioned (climate)? Did a news article take a stand on whether climate change was a fact, or did the news article mention sources that claimed climate change was a hoax (climate perspectives)? Did the article talk about climate change as happening here (special proximity cues) and now (temporal proximity cues)? For each of the main variables, the researchers also included subcategories that show differences within each category. So, for the variable “climate perspectives,” the researchers/coders would code the articles as either denying or confirming climate change. Codebooks also generally list cues that signal each subcategory for the coder; for instance, “claiming that climate change is a government conspiracy or that climate change does not exist” should be coded with the subcode “denial,” in the category “climate perspectives.”
Researchers do not always provide such a clear roadmap. However, commonly, research articles will include enough information in their results section to determine how the coders categorized their major variables (for example, see Table 8.2). In the study of my personal use of social media, the variables are implied by the results table, specifically types of social media and purposes (productive purposes versus entertainment).
Implied Codebook in a Results Table | ||
Productive use of social media (in minutes), presented in % of n | Unproductive use of social media (in minutes), presented in % of n | |
…twitter (before musk) | 75 | 34 |
…twitter (after musk) | 18 | 39 |
…snapchat | 6 | 26 |
— | 1 | |
Other | 1 | 1 |
n | 174 | 176 |
All research articles, however, should include enough information in either the methods section, an appendix, or the results section, that a reader should be able to clearly distinguish the variables and the subcodes of the variables well enough to be confident that they could code samples similarly to the author.
Content Analysis
Content analysis researchers (and readers) implicitly agree to the same general contract discussed earlier for social science—a study is only as good as the method, the method should be clearly described, and findings should be accepted only to the degree that the method warrants. Generally speaking, this means focusing your attention in three specific areas—the sampling method, the coding scheme, and the intercoder reliability. The remainder of this section goes through each in turn, starting with the most basic:
- What is the research question?
- Is content analysis appropriate to answer that question?
- Bernard Berelson, Content Analysis in Communication Research (Glencoe,IL: Free Press, 1952). ↵
- Klaus Krippendorff, Content Analysis: An Introduction to its Methodology (Beverly Hills: Sage Publications, 1980). ↵
- Kimberly A. Neuendorf, The Content Analysis Guidebook (Thousand Oaks, CA: Sage Publications, 2002). ↵
- Norway, which was a colony outpost of other Scandinavians (including the Danes) through much of its history, has a complex relationship with trolls. Trolls in Scandinavian folklore are commonly seen as big, hairy, and slow to act, but fierce when roused. Norway’s various colonizers commonly referred to Norwegians as “trolls” as a way to highlight stereotypes of Norwegians as “slow” but “potentially violent” (and therefore needing the “guidance” of other peoples.) On the other hand, trolls are a distinct part of Scandinavian folk tradition, of which Norway, as a culture, is extremely proud. ↵
- Roberta Weiner et al., “Climate Change Coverage in the United States Media during the 2017 Hurricane Season: Implications for Climate Change Communication,” Climatic Change 164, no. 3-4 (February 2021), https://doi.org/10.1007/s10584-021-03032-0. ↵