Correlation can damn well mean causation

Correlation doesn't mean causation. But it can. Depth of data as well as dimensional parallels parse empty correlations from meaningful ones.

There is a strange phenomenon happening in which strong statistical analysis across multiple dimensions is dismissed with the cliched scientific counter: ‘correlation does not mean causation.’

It’s true. There are correlations that are random and unrelated. An example of this might be that dog grooming appointments and subscriptions to the Wall Street Journal both increased at the same rate every month for a year. They’re correlated, but one does not beget the other.

But there are also degrees of correlation. When there are multiple, strong correlations based on vast amount of data, that’s usually pointing to something.

I say this because I have had quality research dismissed with this cliche despite analysis based on billions of pieces of data collected by the most trust ed institutions in the world.

Whatever the causation is, there will be many strong correlations with associated aspects. You start with a core premise and make some assumptions. Then you look at how you can quantify those assumptions with data to see if they do or don’t support the hypothesis.

Let’s say the hypothesis is that the butterflies from a town with a lot of smokestacks turned grey because of the smoke and smog int he air.

If in surrounding areas the butterflies were 100% white and in the industrial area, they were 50% white and 50% grey, you have correlation 1.

If you had data about the color percentages of the butterflies from before the first factory opened and it was 100% white butterflies, you have correlation 2.

If there was a 2-year period since the factory opened in which it was nonoperational and you saw a resurgence of white butterflies during that time, you have correlation 3.

If when the factories started up again and you start to see a resurgence of gray butterflies in the area, you have correlation 4.

It’s not an unsupported conclusion to say that the factory activity impacts the butterfly colors, making them get more grey.

If for example, you found that the grey pattern continued whether or not the factories were on, it would suggest that the factories are not the cause of the butterflies’ color change.

All this to say, stop dismissing good research with a cliche ‘correlation does not mean causation.’ Look at the data; see if the way it was interpreted makes sense.


And once we’ve processed that, we also need to talk about the importance of control data. Used to be a linchpin of the scientific method but by the looks of it, those days are gone.



  • Tanja Fijalkowski

    Tanja Fijalkowski is an award-winning writer, editor, and designer. A North Bay Area native, she has written for various financial, business, history, and science publications. She's a deep-dive researcher with a strong command of data analysis and simplifying complex concepts.

Leave a Reply