Here's where a logical error can easily occur: The data seems clear and consistent about this association, so we conclude that the best way to improve math performance is to give students bigger shoes. We have just made the leap from correlation to causation. This conclusion doesn't make any logical sense; but the data does show an association. How can this be? The conclusion illustrated here falls under the logical fallacy
post hoc ergo propter hoc, an assertion that because
y follows
x,
y must be caused by
x. The danger of this fallacy is its plausibility. Our brains are wired to find connections between events and, unless these connections are as absurd as the shoe-math connection, we tend to accept them as legitimate. We are lulled into seeing causal associations where they may only be correlational associations.
Before we give up on the hope that research will tell us anything trustworthy, it's important to understand that in most cases correlation is not intended to be the end of the research process; it's largely a way to see if further study into the causal relationship between variables is warranted. If no correlation is found, it's unlikely that any causal relationship will be found later; but if correlation is found we need to determine if that association is causal. We do this by trying to isolate and/or control as many other variables as possible, until we are just looking at our variables of interest (in our example, variables x and y). We tend to do this through one of two techniques: (1) experimental design, or (2) statistical controls.
Experimental design refers to the way of structuring a study to control those extra variables. A preferred way of doing this is through a randomized control trial (RCT), sometimes referred to as a clinical trial. The RCT is an experimental design in which participants are randomly assigned to different conditions that are thought to affect the outcome variable. To go back to our shoe-math example, we could randomly assign students different size shoes (x) and have them take the math test (y). If our notion that bigger shoes improve math performance holds, then we should still see a strong xy correlation after the random assignment to shoe size. Generally speaking, the mechanism underlying the RCT is an assumption that the other possible variables that may link shoe size to math performance will be randomly distributed across the assigned conditions; and by randomly distributing them we control their effects.
Statistical control is a way of mathematically isolating the measured effects of the other variables so we can just look at the relationship between our variables of interest. This technique is often used when an experimental design isn't possible, and involves developing a statistical model that includes many of the other possible variables that may affect the outcome variable. A regression analysis is commonly used for this purpose. To go back to our shoe-math example, we would include measurements from all the variables we believe may link shoe size to math performance within the regression analysis. When included in our statistical model, we're able to parse out their influence and see how much association remains between our primary variables (e.g., variables x and y) after doing so. Recall that we began with a correlation of r(xy)=.74, a number that shows a strong association between variables x and y.