16 Judgment Rule 2 for Experimental Analysis
Judgment Rule: The experimental design should rule out other potential explanations for any changes observed during the experiment.
Key Takeaways
The two most basic experimental designs are: (1) determining if an experimental treatment will produce observable differences between two otherwise identical groups (between group design), or (2) determining if a treatment changes an individual from time 1 to time 2 (within-subject design, also known as repeated measurement design).
Between Group Design
For between group designs, researchers equalize the experimental and control groups by randomly assigning subjects to groups. As a reader, look for whether the subjects were randomly assigned to groups. (Typically, research papers have the following sections: Introduction, Review of Literature, Methods, Findings, and Discussion. Information on how the subjects were assigned to experimental and control groups is in the methods section of the research paper.)
Theoretically, randomization should adequately even out all differences between groups, but researchers are often interested in understanding specific characteristics of the population rather than just randomly assigning these differences away. For example, going back to the research questions discussed earlier, each different question in Example 16.1 has subgroups about which we might specifically want more information. For example, we know from a lot of research that people with a poor body image handle food pressures differently than people with a good body image, so we might reasonably expect kids with a poor body image to be more affected by television commercials for high-density food (See Example 16.1 for additional subgroups).
Example 16.1
Question 1: Does watching television commercials for high-density, high-calorie food increase preferences for high-density high calorie food in pre-teens?
Potentially important subgroups:
Subgroup condition 1: Obese subjects versus normal weight subjects versus underweight subjects.
Subgroup condition 2: Low body image versus normal body image.
Question 2: Does using highly sexualized female avatars decrease playing competence in first-person shooting games?
Potentially important subgroups:
Subgroup condition 1: Males versus females.
Subgroup condition 2: Self-objectifying versus non-objectifying subjects.
Question 3: Does using Facebook increase motivated performance on exams?
Potentially important subgroups:
Subgroup condition: Self-presenters versus self-disguisers.
And, of course, kids with a poor body image are already at risk for eating disorders, so there might be important policy implications in finding out if this specific group reacts differently. We also know, from previous research, that girls who objectify their own bodies are less likely to perceive themselves as competent at various physical and cognitive skills. Would they also be more affected by using a highly sexualized avatar? And finally, we know from recent research that people have different ways of presenting themselves on Facebook. Some people present a highly positive (even self-congratulatory) self-image—one that consistently shows themselves at parties and fun places, doing fun things with good-looking, smiling people. These people tend not to self-disclose negative information, which means that their Facebook “support” for mutual disclosure (and emotional support) is fairly weak. Others choose a much more open strategy in which they balance the good with the bad, and these Facebook presenters are much more likely to receive emotional support from their Facebook friends when they need it. Given that students might need some emotional support during exams, the researchers reasoned that presentation style might affect exam performance, leading to the research question “Does the Facebook strategy that people use make a difference in determining motivated performance on exams?”
How, then, best to determine if specific subgroups respond to the experimental treatment differently than the group as a whole? To figure out treatment effects on these subgroups, researchers use an additional step before they divide the subjects into control and experimental groups (see Figure 16.1). In this step, the researchers give all subjects a preliminary test for the characteristic of interest—body image for the research question on preferences for high-density, high-calorie food; self-sexualization for the research question on the effects of using highly sexualized images; and self-presentation strategies for the question on motivated performance on exams.

The researcher then divides the original subjects into two groups based on the results of the assessment, and then randomly divides each group into a control condition and an experimental condition.
Turning to a specific example, a researcher who is interested in testing whether commercials for high-density and high-calorie food trigger desire for high-calorie food in pre-teens would first gather a group of preteens and give them a test for body image.[1] After the children took the test, the researchers would be able to distinguish children who had a good body image from those who were dissatisfied with their bodies. Next, the researcher randomly distributes the normal body image group into a normal body image control group and a normal body image experimental group, and the dissatisfied body image group into a dissatisfied control group and a dissatisfied experimental group, so the researcher will be able to compare the changes from all conditions with each other. As you would expect, the researchers will be able to tell if the treatment had an effect by comparing the results of both experimental groups with both control groups. They will also be able to tell if the low body image group subjects were systematically more (or less) influenced by the television commercials than the normal body image group.
Within-Subject Design
The within-subject design is the other major type of experimental design. In this design, the experimenter compares the scores of a single individual over time. Each participant becomes his or her own control, and all participants are exposed to every treatment. In within subject designs, researchers are interested in the change in each subject between time zero (before the treatment starts) and subsequent times (after the subject has received a treatment).
Within-subject designs are considered quite sensitive because the researcher is tracking the change in a single individual.
An advantage of within-subjects design is that individual differences in subjects’ overall levels of performance are controlled. This is important because subjects invariably will differ greatly from one another. In an experiment on problem solving, some subjects will be better than others regardless of the condition they are in. Similarly, in a study of blood pressure some subjects will have higher blood pressure than others regardless of the condition. Within-subjects designs control these individual differences by comparing the scores of a subject in one condition to the scores of the same subject in other conditions. In this sense each subject serves as his or her own control.
David M. Lane, “Experimental Designs,” Online Statistics Education: A Multimedia Course of Study (Rice University, University of Houston Clear Lake, and Tufts University, last accessed June 2, 2023), http://onlinestatbook.com/2/research_design/designs.html.
But again, the researcher—and the reader—need to consider whether the treatment made the difference or whether an observed change was due to something else that happened during the experiment. Did the subject’s score decrease because of the experiment or because he or she got tired? Or bored? Did the subject’s score get better because he or she learned how to perform a task more effectively? Changes in performance due to fatigue, boredom, or learning would not be because of the treatment, but because the test subject changed for some other reason.
To control for these test differences, researchers generally counterbalance (switch) the order of the treatments. So in an experiment with two treatments, half of the subject would receive treatment 1 followed by treatment 2, and the other half would receive the treatment 2 first and treatment 1 second. If there were no differences due to the order of the test, then any differences observed would be from the experimental treatments.
Let’s say that a researcher is interested in comparing whether violent movies increase viewer aggression more than or less than violent video games. In a within-subjects design, the subjects would be shown violent movies, tested for aggression, and then allowed to play violent video games, followed by a test for aggression. It is possible that the first treatment would increase each subject’s overall aggression level, thus making it more likely that the second treatment would show an aggressive response (as suggested by excitation transfer). Researchers check for this carry-over effect by randomly assigning subjects to alternating treatments. Half of the subjects would watch the movie first and then play the video game, while the other half would play the video game first, followed by the movie. If the violent video games increased aggression, then the subjects who played the game should test as having higher aggression scores after playing the game regardless of whether they played the game first or second.
Alternating treatments in this way is called counterbalancing. The aggression scores from the group watching a movie first will be counterbalanced with the other group (those who played violent video games first).
Judgment Rule 2B
Key Takeaways
Randomization and Experimental Design
Randomization in experimental methods is, and is not, like randomization in surveys. For surveys, researchers are concerned with whether bias has been introduced in the selection of the sample from the overall population. However, the bias that experimenters (and readers) are concerned about in experiments is whether there is a difference between the experimental and the control conditions (between groups), or between counterbalanced groups (within subject conditions). That is, the experimenters are concerned with whether they introduced bias when assigning subjects to groups; they are typically not concerned with whether the original experimental sample is like a larger population. In a survey, for example, researchers would care if everyone in their sample were left-handed, Welsh vegans (who were either highly aggressive or not aggressive at all). An experimenter wouldn’t care as long as all groups had equal numbers of highly aggressive subjects.
For both surveys and experiments, the randomization process reduces the change of bias; it doesn’t eliminate the possibility. You could randomly pick ten pieces of your favorite dark chocolate out of a bag of mixed chocolates, but it is highly unlikely. More likely, you picked out the dark chocolate you liked over the milk chocolate you didn’t. Randomization guards against bias in the selection process.
While it is the responsibility of the researcher to take steps to guard against introducing bias, it is the reader’s responsibility to check whether the researcher took those steps (for example, assigned subjects randomly).
- An example of this type of scale is the Children’s Body Image Scale. Helen Truby and Susan J. Paxton, “The Children’s Body Image Scale: Reliability and Use with International Standards for Body Mass Index,” British Journal of Clinical Psychology 47, no. 1 (March 2008): 119-124, https://doi.org/10.1348/014466507X251261. For this test, children are given seven photographs of children whose weight ranges from very thin to obese. The children are told to select which picture they believe looks most like their size (perceived weight) and which picture represents the size they would like to be (ideal weight). The measure of body dissatisfaction was the difference between the size the child said he or she would like to be and the size the child thought he or she was. ↵