Understanding Causation and Correlation in Paid Search
July 16, 2014
As account managers we spend much of our time surrounded by data— percentages, rate stats, averages, raw numbers spilling around us like the cascading binary from The Matrix. And through sheer exposure to this sea of numerical information, we develop a foundational understanding of how PPC works.
However, the data that surrounds us isn’t comprised only of results. It holds patterns and relationships, hidden consistencies that have the potential to yield actionable and dramatic insights. Rooting out these insights is the daunting task of the ambitious account manager.
But the path to data-driven optimization nirvana is riddled with mirages…most notably the common error of mistaking a correlating relationship with a causal relationship.
The first step to avoid this mistake is to understand the difference between correlation and causation. There’s an inexhaustible amount of information publicly availably on the topic, but Tyler Vigen is my favorite for the hilarity with which he addresses the common misconception.
Below is just one of many graphs pulled from Tyler’s website, Spurious Correlations.
Holy cow, would you look at how closely it correlates? Looking at this information on a stacked line graph (exactly the way performance data is presented in the AdWords interface), it’d be awfully difficult not to assume a dependent relationship between the two bits of information.
And then you identify what the data represents: divorce rate in Maine versus per capita consumption of margarine in the United States. Granted, it doesn’t take too much of an imagination to envision a relationship between eating margarine by the pound and divorce, but it’s still evident that the 99% similar data contour is a coincidence. This is not to say that all correlations are coincidence, only that correlations can be coincidence.
Causation, by definition, cannot be coincidence. It implies that a relationship between two sets of data is causal, or dependent. For example, imagine you have a campaign limited by budget. You double the budget, and the clicks increase. The increase in spend has a causal relationship with the increase in clicks; the clicks are precisely what boosts spend.
Identifying a causal relationship between metrics in your accounts is absolutely a good way to approach optimizations. However, they’re few and far between, and when they do exist, they’re generally very obvious. As search engine marketers, it’s identifying the valuable correlating relationships versus the coincidental correlating relationships that’s important (and challenging).
For example, I like analyzing non-converting cost (NCC%) versus CPL in my accounts (if you’re interested in learning how, check this out). I’ll often see something like this:
The relationship evinced in the graph— costs per lead are falling, meanwhile the percentage of keywords that are spending without converting is increasing— defies what conventional wisdom would lead us to believe. It stands to reason that there might be a relationship between non-converting spend and CPAs. However, were this relationship mistaken for a valuable correlating relationship, the optimization that it hypothetically informs is to further increase non-converting spend in order to decrease CPAs, which, when you think about it, makes absolutely no sense.
Which brings me to my point:
Don’t take the numbers at face value. Think about them.
Immersed in the avalanche of data that is AdWords, and looking for data-driven insights, it can be easy to forget about external factors. Competitor behavior. Seasonality. Google testing on the SERP. These are all outside elements that can impact your data. Being too quick to attribute fluctuations in performance to your optimizations is a common manifestation of mistaking correlation for causation.
This fallacy is called post hoc ergo propter hoc. Or, after it and therefore because of it.
I could expound on this, but why, when I could just let the winsome President Josiah Bartlet do it for me?
What it boils down to is isolating the signal from the noise— finding relevant, actionable information— the signal— in a venue where that type of data is camouflaged by sheer volume of numbers— the noise. In order to do this, once you identify a correlating relationship, think about all relevant factors (internal to the data, such as the formulas that yield the metric, and external) and make sure you aren’t being duped by a margarine consumption versus divorce rate style correlation!
(featured image by Beat Kühn/Flickr)
Browse By Category
The Rise of The No-Click Search Results
Searching for answers and data has never been easier thanks to the help of search engines. Here we’ll discuss the effect that no-click cost is having on ppc, including the growing popularity of voice search.
Becoming Allies With The Sales Team: A Lead Gen Solution
Collaborating with your client’s sales team can provide a treasure trove of insights. Explore how PPC tactics are utilized to help shorten the sales cycle and improve close rates!
The Opportunities Tab: AdWords vs Bing, A Friendly Match-Up
Dive into AdWords and Bing opportunities tabs and see how they compare. This includes accessibility, features, and performance insights.
A bi-weekly newsletter packed full of resources and strategies that will help make you a better PPC expert.
Hanapin Marketing | The PPC Agency of Experts Behind PPC Hero
How to Drive Calls that Actually Convert
Our experts are here to explain how gaining call tracking data can help you close the gap in your customer journey and tailor their experience so they actually convert.
New Facebook Analytics Updates from F8
Learn about journeys and automated insights, new tools announced at Facebook's F8 conference for the Facebook Analytics tool.
Spotify Advertising: The Why and How
This blog post gives you the rundown on Spotify, including why and how you might advertise on Spotify's new self-serve ad platform, Ad Studio.