Statistical Validity Is Leading You Astray
January 24, 2011
Sometimes, the idea of testing in PPC accounts can get a little dicey. What can you test? What makes for a valid test? Is a significant test always valid? What “untestable” factors might influence performance? To most positively affect the return your tests will offer, it’s important to both plan your tests carefully and to understand the various interactions of factors beyond your control with your test data.
There are many ways to increase the validity of any test you run in PPC, and making changes based on statistical significance is a primary step in ensuring real-life predictive accuracy, but I think anyone who has managed PPC accounts for a while probably has had more than one “WHAT?!” moment in terms of something working or not working as well as you had anticipated, even when your decisions were based on statistically significant data. I have been trying to understand: why does that happen? Can we control it? Maybe, maybe not, but considering the factors outside of an account that influence performance can help us respond more usefully when performance doesn’t follow our plan, and can help prevent those unexpected surprises in the first place. The quest to better control the outcomes of PPC tests has dragged me into the terrifying world where statisticians dwell, so I’m going to do my best to explain what I think I understand and try to distill the complicatedness into something useful for the everyday PPC advertiser.
So what is statistical validity? The real answer to this is more complex than we usually act like it is in PPC, and that gets a bit at the reasons why “valid” tests don’t always have the impact we expect. To simplify for PPC purposes, when we talk about something being statistically valid we’re generally using that data to attempt to increase our statistical conclusion validity. However, as referenced in the article, there are several threats to this type of validity (factors which decrease its power), and this is why we can’t assume it’s the only thing we need to consider in crafting predictions and making plans based on those predictions.
You can find general suggestions to improve the predictive ability of your tests floating around, like “an ad should have at least 1000 impressions before you make a decision about it”, but to determine whether an observed difference is actually statistically significant, you should use an analysis tool. It can be surprisingly tricky to just look at something and guesstimate whether there is a significant performance difference, especially as tests get more complicated. Luckily the internet is here to help us with that too, and there are a variety of tools at your disposal which can assist in making that determination.
For landing page testing, there is of course Google Website Optimizer, which can assist you in performing simple or complicated tests on changes to your landing pages. Chad Summerhill has released a PPC ad text testing validity tool (which can also help you determine significance for tests that test two identical ad texts with different landing pages, in a different sort of landing page testing), and the folks at MarketingExperiments will explain how to determine statistical validity of your data samples as well. With the introduction of AdWords Campaign Experiments, Google will nicely split test a lot of elements of your PPC account for you and report on the significance of the tests without making you do complicated math as well.
There’s a reason we’re not all statisticians, and this stuff is complicated. That’s why you hear a lot of generalizations about PPC testing. In any case, using these types of tools to verify validity (or at least significance) rather than relying on assumption can greatly increase the likelihood that your testing will positively influence ROI.
As referenced above, there are some requirements a test must fulfill to reach significance. In PPC, we generally consider these to be adequate traffic numbers and proper setup for A/B or multivariate testing, and when considering these factors, if you have a sufficiently high-volume account you can reach statistically significant conclusions fairly quickly. But what about validity? This is where it gets more complicated, and I think there’s sometimes a tendency to oversimplify and assume significance is enough.
For example, say you have an account that can give significant data after one day of testing. Great! You can test ad text messages, bid modifications, whatever you want pretty quickly. You could perform seven ad text tests a week! The problem is, as anyone who has an account that performs differently by day of the week may realize, that the results of Monday’s significant test aren’t necessarily valid for Saturday traffic. If you make decisions based on too short a time range, even with statistically significant data, you’ll still increase your chances for error. Consider another factor beyond your control: say your main competitor runs out of budget and their ads are off for a week. Any conclusions you draw from testing at that time might be affected by the lack of competition, and may not be valid for the same audience when the competition re-enters the scene. The same goes for seasonal trends, and for any other influences on your account that vary over time.
These aren’t necessarily things that we can prevent, but we can definitely think about factors outside our immediate control and either ensure that our tests span an adequate range, or plan to re-test to verify the result in a different time frame. Some accounts are more prone to variation than others- ever have a keyword that works beautifully one month and the next with the same bid and position and ad texts and landing page, spends at the same level for ¼ the leads? There’s going to be variation, the best we can hope for is to understand the level of variation our own accounts experience and both set up and apply the results of tests wisely to account for as much of it as is needed to get valid results.
Next time you’re getting ready to say apply changes fully in AdWords Campaign Experiments or change your ad messaging based on statistically significant data, think twice and make sure you’ve got enough regularity in the data set to increase your chances of validity as well.
A little more info about validity and testing in marketing, if you’re so inclined:
How Should You Use Display Advertising?
We hear similar questions from clients on a regular basis. One question we hear often is “How do your other clients use Display Advertising?”
Expanded Text Ads: 7 Million Clicks Say They’re Underperforming
Over the past 6 weeks, we have been lucky enough to experiment with expanded text ads, and frankly, I’m not impressed with their performance.
£300 Off Hero Conf London Ends Tomorrow!
Time is nearly gone to reserve your seat for Hero Conf London, 24-26 October at etc.venues St Paul’s, at a huge discount off the regular conference rate!
A bi-weekly newsletter packed full of resources and strategies that will help make you a better PPC expert.
Hanapin Marketing | The PPC Agency of Experts Behind PPC Hero
Expanded Text Ads Come To Google AdWords Editor
Advertisers have been able to create expanded text ads, but couldn't utilize AdWords Editor. That has now changed with the release of the latest version of AdWords Editor.
Bidding Farewell to Adwords Converted Clicks: Don't Panic
AdWords will soon be saying goodbye to the "Converted Clicks" metric. Learn more about what this means for your account and how to prepare for the transition.
Olympic PPC Ad Strategies
How spectators consume sporting events and a massive expansion in the digital space will prove to shake up who and how advertisers reach their audiences.
Hanapin's PPC Resources Are Now Ungated!
You can now access Hanapin's whitepapers and toolkits without having to fill out a form!
View Ad Creative Through The Eyes Of Your Searchers
Learn to understand that ads do not live in the vacuum of an excel sheet and common practices should be tested.
[New Webinar!] 11 Red Flags To Watch Out For When Working With An Agency
Leveraging Analytics For Remarketing: Picking The Ripest Fruit
Remarketing prevalence is well founded based on competitive returns in both volume and efficiency. With that said, this "low hanging fruit" world isn't always easy to pick.
How To Analyze Time-Of-Day Performance For Luxury Products
The importance of ad schedules varies across industries, services, and products. For e-commerce companies that sell luxury goods, ad schedules are more crucial.
One Week Left to Save 25% at Hero Conf London!
You still have time to join us for Hero Conf London, 24-26 October at etc.venues St Paul's at 25% off the conference price. But time is running out.
4 Common Questions Prospects Ask Me During The Sales Process
There’s always new, hot topics and questions we get asked by prospects to see what our company’s take is, and how we go about implementing from a services perspective. Read what our most common questions are.