22 Conclusions
Social science—for that matter, physical science as well—is above all a way of knowing. And scientific knowing is based on methods.
Each method—whether it has been focused on learning what people are willing to tell you about themselves (surveys), or examining the artifacts that people produce (content analysis)—has both strengths and weaknesses. The strengths, of course, are what kinds of knowledge the methods can produce. That is, can the method tell you what impact something has on people (experiment), or is the method collecting a range of statements from a specific population (survey)? The weaknesses are the specific ways that the findings are limited. (For example, research on the experience of sitting in a movie theater would be limited if the researcher only talked to short people; the troubles that tall people have would not be addressed because no tall people were interviewed.)
The rules for each method discussed in the main body of this book essentially allow you to tell the generalizability and reliability of each individual research effort. With each different method, the rules differ for determining how greatly the people or objects studied resemble the overall population (generalizability) and how likely it is that another study using the same method on the same population will deliver the same results (reliability).
For example, both the strength and weakness of experiments is that the experimenter controls exactly who the subjects are and what the subjects do. The researchers can rule out other potential explanations, leaving the one thing the experiment manipulated (the treatment) as the only explanation left.
At the same time, the experimenter is also creating a highly artificial situation. Because the experiment is artificial—a laboratory situation in most cases—the experimental results may have low ecological (real-world) validity. In general, the more artificial the experiment is, the greater the likelihood that other factors—not present in the experiment—could change what happens in the world. That does not mean that you can “throw out” the results that you do not like because the experiment is artificial (throwing out research whose findings you do not like is an example of motivated reasoning), but it does suggest that you interpret the results with some degree of caution.
Take, for example, an experiment in which the researchers are trying to check whether using a smartphone while crossing a busy street is risker than not having a smartphone. The researcher considers using two different methods. In one, the researcher sets up a virtual scene of crossing the road in the laboratory. He has his experimental subjects uses their smartphones while deciding when to cross the street, while the control subjects are simply carrying their smartphones in their hands. The researchers described the experiment setting as follows:
“Traffic flowed in a bidirectional manner on three computer monitors arranged in a semi-circular manner in front of participants. Participants stood on a wooden “curb,” watched the traffic in a first-person point of view, and stepped down off of the curb when they deemed it safe to cross. The perspective then changed to third person point of view, allowing participants to see whether they made it across the street safely. The speed at which a participant crossed matched that of their average walking speed recorded earlier in the session. Ambient and traffic noise was delivered through speakers.”
Schwebel et al.[1]
In the second experiment, the researcher could videotape people naturally crossing a street in the course of their daily lives. The researcher could then used his video record to compare how safely smartphones users were in real-world conditions, compared to nonusers.
The first experiment, in the laboratory, has fairly low ecological validity—stepping off of a wooden block while looking at three computer screens isn’t the same as crossing a real street. It’s possible that people in the laboratory knew (either consciously or at some low, barely noticeable level of awareness) that they were a lot safer in a laboratory than they would be sharing a walking space with two to eight thousand pounds of moving plastic and steel. Perhaps that knowledge relaxed them and make them a little bit less careful. It’s possible. But people in the second experiment are sharing the space with trucks, cars, and SUVs. Because the second experiment was recording real people really crossing the road, this experiment has high ecological validity.
But the first experiment still showed that people with smartphones were more distracted and crossed the (albeit virtual) street less safely than people without cellphones. Those results do not go away just because the ecological validity is lower. The research has shown that there is some potential for concern, and that concern needs to be explored and dealt with. Future research will either continue to show a safety risk for using cell phones while crossing roads or not. But at some point, a decision will need to be made that the evidence is “good enough” to warrant some policy decision.
Reading and critically evaluating scientific literature give the reader power to critically evaluate research when that research is important to them—whether the importance is understanding a subject or making a decision. One of those powers is essentially understanding how to interpret the results according to population differences. Suppose you—or someone you care deeply about—was diagnosed with stage four liver cirrhosis from Non-Alcoholic SteatoHepatitis (NASH). The doctors and the web tell you that a survival rate for people with a stage four diagnosis is one to five years at diagnosis. However, the population data from which the doctors are drawing is all patients with the diagnosis of stage four cirrhosis—alcoholics and non-alcoholics together. The treatment potential for the two groups is substantively different. The non-alcoholics are at least potentially eligible for a liver transplant, while active alcoholics are not.
The survival rates of people waiting transplant are also important to critically evaluate. The medical profession uses MELD scores to determine liver function.[2] MELD scores range from six to forty, with forty being essentially a death sentence within a matter of months, if not days. Technically, however, the MELD score predicts the likelihood of death over the next ninety days of illness. A MELD score of twenty-one, for example, gives a patient a one in five chance of dying within the next three months. However, the three-month mortality range is commonly given at MELD score ranges of below ten, ten to nineteen, twenty to twenty-nine, thirty to thirty-nine, and forty. Therefore, logically speaking, an individual with a score of twenty-one has less of a chance of dying within three months that the MELD score estimate would suggest. In addition, people with strong family support who are younger and female are also less likely to die than isolated, older males. In other words, the more detailed your understanding of the population, the more you can assess individual risk.
Essentially the judgment call that policy makers make—always—is on how great a risk is. Would you give up watching your favorite television shows because one study in a laboratory with thirty college students in Holland showed that students who watched the show were less likely to open up a door for a stranger? Probably not—that would be betting the farm on a very small study done in a different culture. Would you continue to argue that smoking is not harmful to your health after fifty years of research, with thousands of studies that have shown smoking increases cancers, heart problems, skin problems, lung problems, and a host of other health-related problems? Only if you were working for cigarette companies, and probably not even then. To ignore the weight of the evidence that we now have about the dangers of smoking is simply willful ignorance. Even if an individual decides that the pleasure (benefit) from the next cigarette is worth the risks, the vast, vast majority of smokers and nonsmokers today accept the evidence that both this cigarette and the next pose a health risk.
At some point, policymakers—and we are all policymakers—need to weigh what policies are appropriate given what we know now about how our current practices—whether watching television or surfing the net during lectures—affect us and the people around us, a process that includes—but is not limited to—judging the soundness of the available research and how much risk an individual or a society is willing to tolerate.
- David C. Schwebel et al., “Distraction and Pedestrian Safety: How Talking on the Phone, Texting, and Listening to Music Impact Crossing the Street,” Accident Analysis & Prevention 45, no. 2 (March 2012): 266-271, https://doi.org/10.1016/j.aap.2011.07.011. ↵
- MELD is an abbreviation of Model for End-Stage Liver Disease. The Meld score ranges from six to forty and is based on the results of several laboratory tests that determine creatinine, bilirubin, serum sodium, and clotting factors (INR). Model for End-Stage Liver Disease (MELD) for ages twelve and older. Kiran Bambha and Patrick S. Kamath, “Model for End-stage Liver Disease (MELD),” in UpToDate, edited by Bruce A. Runyon, accessed July 2023, https://www.uptodate.com/contents/model-for-end-stage-liver-disease-meld ↵