6 Causal Thinking in Epidemiology
Sheena Martenies
Causal Thinking in Epidemiology
Overview
As we covered in previous chapters, an important function of the field of epidemiology is documenting the frequency, distribution, and patterns of health and disease within a population. Another important goal of the field of epidemiology is identifying the factors that cause health and disease to occur within a population. Determining whether a health behavior, environmental exposure, or dietary component, for example, is the cause of a health or disease state is necessary for identifying strategies to promote health in a population. When epidemiologists and public health practitioners identify causal factors, they can better spend time and resources in ways that will have an impact on population health.
Sometimes identifying causal factors is as simple as identifying the source of a foodborne illness outbreak or the gene mutation that results in a disease. Other times, identifying causal factors requires epidemiologists to assess several different types of evidence before drawing conclusions about a risk factor under investigation. In this chapter, we’ll discuss how epidemiologists think about causality and the steps they work through when identifying the causes of health and disease states.
Learning Objectives
By the end of this chapter, you will be able to:
- Discuss why identifying causal factors is important for epidemiology and public health practice
- Explain the difference between correlation and causation
- Describe the criteria by which epidemiologists determine if a specific factor is a cause of a disease
What is Causality?
Causality is the idea that outcomes are attributable to events that came before them. That is, if event A occurs, outcome B will occur as a result. In epidemiology, we are interested in understanding the causes of health and disease states. This involves identifying the factors that come before disease—called risk factors—or the factors that result in good health.
Causes of health or disease are diverse. For some diseases, there is an important genetic cause. For example, cystic fibrosis is a disease caused by a mutation in the gene for the protein that moves salt in and out of cells. Individuals who inherit two copies of the mutation for this gene will develop cystic fibrosis.[1] This mutation is a genetic risk factor for the disease. Other diseases have strong behavioral or environmental risk factors. A classic example is lung cancer. Lung cancer can be caused by behavioral (smoking) or environmental (radon, diesel exhaust) factors.[2]
In reality, most diseases are caused by complex interactions between genetic factors and behavioral or environmental factors. For example, people with a family history of heart disease who also smoke or drink alcohol may be at the highest risk for coronary heart disease compared to people with other risk factor profiles.[3] This combination of risk factors (genetic, behavioral, and environmental) can make it challenging to identify the most important ones from a public health perspective.
After reading the section on causality, test your knowledge on types of risk factors.
Sufficient-Component Cause Model
A helpful model for thinking about what risk factors are important for the development of disease is the sufficient-component cause model. This model was introduced by Kenneth Rothman in the 1970’s as a way to think about the way causes of disease are interrelated.[4] In this model, there are three categories of disease causes:
- Component cause: An individual factor that, when present, contributes to the development of disease.
- Necessary cause: A factor that is always required for the development of the disease. It may or may not be a sufficient cause by itself.
- Sufficient cause: a factor or set of factors which, when present, result in the development of the disease.
The concepts of component, necessary and sufficient causes of disease are distinct but related. A necessary cause is one that is required in order for a disease to occur. For example, a patient cannot develop AIDS without first being infected with HIV. Infection with HIV is a necessary cause for AIDS, but it is not sufficient. Not all people who are infected with HIV will develop AIDS. A component cause of AIDS is a lack of treatment via anti-retroviral drugs. This is a component cause because lack of or inadequate treatment with antiviral drugs is what allows an HIV infection to progress to the point where AIDS develops. However, it is not a sufficient cause by itself because most people do not take antiviral drugs, and a lack of treatment by them in the absence of HIV infection cannot cause AIDS. The lack of or inadequate treatment by antiviral drugs must accompany the necessary cause of HIV infection in order for a patient to develop AIDS. A sufficient cause for AIDS, therefore, would be the combination of infection with HIV and the lack or inadequate treatment with antiviral drugs.
Sufficient causes can be thought of as pies (Figure 1).[5] Each slice of the pie represents a component cause, and each whole pie represents a sufficient cause. There can be multiple pies for each disease state we are interested in. If we identify a component cause that is present in all of the known pies for a disease, we would consider this component cause a necessary cause.
Figure 1. A conceptual model of the sufficient-component cause model. Each pie represents a sufficient cause of the disease. Each slide is a component cause. Slice A is a necessary cause because it appears in all three causal pies. Adapted from Figure 3-1 in Rothman, Kenneth J. Epidemiology: An Introduction. Second Edition. Oxford, New York: Oxford University Press, 2012.
Importantly, the sufficient-component cause model makes clear that most health and disease states have multiple causes and that there are many potential pathways for health or disease to develop.[6] There isn’t a single cause of heart disease, for example, but many different causes that include diet quality, physical activity levels, genetic and family history, smoking, exposures in the workplace, and others. When we have a better sense of the combination of causes that can lead to the development of disease, it becomes clearer to public health and medical practitioners where interventions may have the most success in reducing disease prevalence, incidence, or severity.
Correlation vs. Causation
An important adage in the field of epidemiology is this: correlation does not equal causation. What this means is that two factors can appear to be related statistically, but that does not mean that one causes the other. A person may get a sunburn almost every time they eat ice cream, but that doesn’t mean that the ice cream causes the sunburn. It just means that they are often outside when eating ice cream.
Correlation is a statistical concept that measures how much one variable changes in relation to another. A correlation coefficient is a numerical representation of this relationship ranging from -1 to 1. If two variables increase in the same direction, the correlation coefficient is positive. For example, weight generally tends to increase with height. If two variables change in opposite directions, the correlation coefficient is negative. For example, as the amount of time a student spends preparing for an exam increases, the number of wrong answers on the test decreases. If two variables are not correlated, the correlation coefficient will be 0 (or very close to it).
However, just because two variables appear to be related via statistical correlation, it doesn’t mean they are causally related. An example of this type of spurious (i.e., random or invalid) correlation comes from Tyler Vigen. Using publicly available data sets, Tyler found that the annual per capita consumption of margarine, a butter substitute made from vegetable oil, in the United States is correlated with the annual divorce rate in Maine (correlation coefficient = 0.99; Figure 1).[7]
Figure 2. An example of a spurious correlation between annual per capita margarine consumption and annual divorce rates.
Thinking critically about these two concepts, it is clear that they have nothing to do with each other. Whether or not people in the United States consume margarine has no bearing on whether people living in Maine decide to dissolve their marriages. But, if we plot these two variables on the same chart, they are very clearly correlated. By just looking at the data presented in this chart, we might conclude that one causes the other, not just that they are correlated. This type of mental shortcut can occur even when we use language that tries to discourage people from drawing this type of conclusion.[8] Thus, it is really important for people who conduct epidemiology studies and people who use these results to make public health decisions to spend some time thinking critically about whether there is enough evidence to determine whether a factor is causing a health state or disease state of interest.
Determining Causality
So, if we know that diseases can have several causes, and we cannot rely on correlation alone to decide about whether or not a risk factor is causal, how do epidemiologists assess and determine whether a factor is causally related to our health or disease state of interest? This is a question that epidemiologists have been thinking about for quite some time.
In epidemiology studies, most of our information on the relationships between risk factors and health or disease states comes from observation. In other scientific disciplines, researchers can perform experiments where conditions are the exact same from group to group except for the factor they are interested in studying. For example, we can study the impact of bisphenol A, a known endocrine disruptor, on breast tissue using an experimental design where the breast cells in each experimental group are genetically identical and the only difference between groups is the amount of bisphenol A that the cell are exposed to.[9] This allows for straightforward interpretation of results: if there is a change in the measured outcome, we can conclude it was due to the change in the experimental factor. However, the use of experimental designs in epidemiology is generally limited to interventions we think might improve health—think, trials of new drugs to treat disease or interventions to help people stop smoking. There are both ethical and practical considerations that limit the use of experimental designs. We cannot ethically ask people to engage in behaviors or expose them to environmental factors that we suspect might harm them.
Instead, to study relationships between risk factors and health outcomes, we use observational designs. In observational designs, we gather information from study populations as they previously or currently live and use statistical techniques to determine if there is an association between the risk factor and health or disease state we are studying. For example, if we want to study the relationship between occupational exposure to pesticides and the development of Parkinson’s disease, we cannot conduct an experimental design; intentionally exposing agricultural workers to pesticides is unethical and dangerous. Instead, we can conduct an observational study where we recruit a population with and without Parkinson’s disease, ask them about their history of pesticide exposure, and use statistical analyses to determine whether there is a relationship between pesticide exposure and the development of Parkinson’s disease.[10]
These types of analytical study designs are covered in more detail in CHAPTER TK. Analytical epidemiology designs are incredible useful for studying the potentially harmful effects of risk factors in human populations. However, they can be challenging to interpret because the researchers do not control the exposure, and they cannot always ensure that study groups are similar. When study groups are different or if there are potentially other causes of the health outcome that are not accounted for in an observational study, the results of that study may be biased. For these reasons, it is usually not possible to determine causality with a single study or even a few studies. Instead, epidemiologists usually need to gather a lot more evidence before determining if a factor is a cause of a health or disease state. This evidence will come from a number of sources, including other epidemiology studies in different populations and locations and studies from biology, toxicology, and other related fields.
In 1965, Sir Austin Bradford Hill, an English epidemiologist, published a paper in the Proceedings of the Royal Society of Medicine outlining nine factors that should be considered when assessing whether a factor under investigation causes a health or disease state of interest.[11] These considerations (or “viewpoints,” as Hill called them) are: (1) strength of association, (2) consistency, (3) specificity, (4) temporality, (5) dose-response, (6) plausibility, (7) coherence, (8) experiment, and (9) analogy (Table 1). These nine considerations have come to be known as the Bradford Hill criteria. They are not a strict checklist of criteria that need to be met in order to establish a causal relationship, but rather a list of things to consider when deciding about causality.
|
Table 1. Summary of nine viewpoints identified by Sir Austin Bradford Hill when considering whether a risk factor causes a disease state. |
|
|
Bradford Hill Viewpoint |
Definition |
|
Strength of Association (Effect size) |
The relationship between the risk factor and health or disease state should be strong. Weak relationships may be due to other factors that have not been accounted for. |
|
Consistency |
Relationships between the risk factor and the health or disease state should be similar across different studies in different populations. |
|
Specificity |
Relationships between the risk factor and health or disease state in a population should have no other likely explanation. |
|
Temporality |
The outcome must occur after the cause. |
|
Dose-Response |
Greater exposure to the risk factor should increase the likelihood or severity of the disease. |
|
Biological Plausibility |
There should be a plausible biological explanation for how the risk factor causes disease. |
|
Coherence |
There should be consistency between epidemiological and laboratory studies. |
|
Experiment |
When feasible, experimental evidence supports the causal relationship. |
|
Analogy |
The relationship between the risk factor and health or disease state is similar to other observed relationships. |
As mentioned earlier, these criteria are not a checklist of requirements. Instead, they provide helpful concepts to consider when examining evidence. For example, evidence of biological plausibility may not be available if the risk factor is newly identified or if our understanding of the biological system being impacted is limited. Similarly, some associations between risk factors and health outcomes are very small. For example, the association between particulate matter exposure and premature mortality, is very small relative to other risk factors for death that have been identified, but the body of evidence otherwise supports the idea that exposure to air pollution causes premature death.[12]
Of these nine criteria (or viewpoints) only one must actually be met in order for a factor to be identified as a cause of a health or disease state: temporality. In order for a risk factor to cause the disease, it must occur before the disease starts. This can sometimes be challenging to determine in observational studies where we assess the presence of risk factors and health outcomes at the same time. Other times, it can be difficult to pinpoint when a health outcome starts. Some health outcomes like cardiovascular disease or cancer start developing well before clinical symptoms are present.
Key Takeaways
In this chapter, you learned about how epidemiologists identify the causes of a health or disease state. Sometimes, it is clear what the cause of a disease is, as is the case for genetic diseases or in a foodborne illness outbreak. Most of the time, however, the causes of a disease are multifactorial. Most diseases have both genetic and environmental causes, and there are multiple pathways and by which these risk factors can cause disease. Thus, deciding about whether a suspected risk factor is a causal factor for a disease requires epidemiologists to consider a large body of evidence. The Bradford Hill criteria provide a helpful framework for considering evidence from epidemiology and other studies when making these decisions.
- Cutting, Garry R. “Cystic Fibrosis Genetics: From Molecular Understanding to Clinical Application.” Nature Reviews Genetics 16, no. 1 (January 2015): 45–56. https://doi.org/10.1038/nrg3849. ↵
- Cruz, Charles S. Dela, Lynn T. Tanoue, and Richard A. Matthay. “Lung Cancer: Epidemiology, Etiology, and Prevention.” Clinics in Chest Medicine 32, no. 4 (December 1, 2011): 605–44. https://doi.org/10.1016/j.ccm.2011.09.001. ↵
- Talmud, Philippa J. “Gene–Environment Interaction and Its Impact on Coronary Heart Disease Risk.” Nutrition, Metabolism and Cardiovascular Diseases, Gene-environment Interaction in Relation to Obesity, Diabetes and Cardiovascular Disease, 17, no. 2 (February 1, 2007): 148–52. https://doi.org/10.1016/j.numecd.2006.01.008. ↵
- Rothman, K. J. “Causes.” American Journal of Epidemiology 104, no. 6 (1976): 587–92. ↵
- Rothman, Kenneth J. Epidemiology: An Introduction. Second Edition. Oxford, New York: Oxford University Press, 2012. ↵
- VanderWeele, Tyler J. “Invited Commentary: The Continuing Need for the Sufficient Cause Model Today.” American Journal of Epidemiology 185, no. 11 (June 1, 2017): 1041–43. https://doi.org/10.1093/aje/kwx083. ↵
- Vigen, Tyler. “Per Capita Consumption of Margarine Correlates with The Divorce Rate in Maine (R=0.993).” Accessed August 22, 2024. https://www.tylervigen.com/spurious/correlation/5920. ↵
- Gershman, Samuel J., and Tomer D. Ullman. “Causal Implicatures from Correlational Statements.” PLOS ONE 18, no. 5 (May 18, 2023): e0286067. https://doi.org/10.1371/journal.pone.0286067. ↵
- Qin, Xian-Yang, Tomokazu Fukuda, Linqing Yang, Hiroko Zaha, Hiromi Akanuma, Qin Zeng, Jun Yoshinaga, and Hideko Sone. “Effects of Bisphenol A Exposure on the Proliferation and Senescence of Normal Human Mammary Epithelial Cells.” Cancer Biology & Therapy 13, no. 5 (March 1, 2012): 296–306. https://doi.org/10.4161/cbt.18942. ↵
- Narayan, Shilpa, Zeyan Liew, Jeff M Bronstein, and Beate Ritz. “Occupational Pesticide Use and Parkinson’s Disease in the Parkinson Environment Gene (PEG) Study.” Environment International 107 (October 2017): 266–73. https://doi.org/10.1016/j.envint.2017.04.010. ↵
- Hill, Austin Bradford. “The Environment and Disease: Association or Causation?” Proceedings of the Royal Society of Medicine 58, no. 5 (May 1965): 295–300. ↵
- Dominici, Francesca, Antonella Zanobetti, Joel Schwartz, Danielle Braun, Ben Sabath, and Xiao Wu. “Assessing Adverse Health Effects of Long-Term Exposure to Low Levels of Ambient Air Pollution: Implementation of Causal Inference Methods.” Research Reports: Health Effects Institute 2022 (January 1, 2022): 211. ↵