Why You Can’t Just Trust Your Split Test Stats

By Adam Lundquist | @adamlundquist | CEO of Nerds Do It Better

Headline image


The split test experiment you ran on your ads is complete.


You ran two different ads on AdWords, and made sure to test an important variable like the value proposition. You then marked the experiment as complete when whichever program you used told you it was. You took the program at its word – because computers are good at math and began your next split test in the cycle. As you continued to run these tests, something seemed off with the results – but you see the split test stats in front of you in black and white.


The tests continued to complete and provided you the stats of the winning ads and yet in the months that followed your quality score remained low and your click through rate showed a lack of clicking through. You are frustrated at the process, annoyed at how much time you wasted, and angry at the amount of money you lose on every click of your low quality score keywords.


Something is wrong, and it is a common mistake in the PPC industry.


You Believe The Stats Without Using Your Brain


Split test statistics are just that – statistics.


The business results that are achieved from the stats are only as good as the person who analyzes them and how they apply their knowledge, experience and (most importantly) common sense.


If the statistics seem to you like they are not making any sense, then as Toucan Sam says, “Follow your nose” and investigate them! Blindly following the stats is a mistake and can cost you and your clients the two things businesses never have enough of – time and money. You can’t just rely on stats – you have to use your brain!


The programs that analyze your stats and determine that your experiments have reached statistical significance are written to tell you if your tests have mathematically concluded to the standards set by the programmer. They do not tell you what your stats mean, or even if you ran a valid split test. This can lead your stats to “trick” you into thinking you have valid and completed split tests when the reality is anything but.


If you plan on having a successful split testing program, you need to look out for common problems that can trip up your statistics and split testing program entirely. Here are three common problems that I often see that you can fix immediately to get your split tests and split testing stats in order:


Problem One: Stats Come From An Unrealistically Small Amount Of Data


There is no way around it, you need to have a solid amount of data for your split tests to mean anything. Many excel sheets and even certain tools tell you that your split test is complete when statistically, yes it is – but in reality more data is needed for a conclusive test. This is where you need to use your common sense and see if it meets your definition of a reliable experiment. If you see that you have something like 10 – 30 clicks and your split test is complete, it is worth looking into the results and seeing if more data is necessary for it to truly be reliable for your business objectives. This usually means that your tests only ran to a confidence interval that is under 95%.


Think about the amount of data that you need ahead of time to accept that the split test is completed as well as what level of confidence you want to test to. This is important – think about it ahead of time. There is a huge difference between being surprised by the results of a split test (which happens often and is acceptable), and being surprised a split test concluded before there really is not enough data.


You can use common sense or experience, or even do some research and look to the experts. I like Brad Geddes’ advice of at least 500 clicks, but other experts have differing opinions of when to accept the results.


Problem Two: Stats Come From Data That Run At Different Times


This is an issue that usually occurs when marketers begin PPC advertising and split testing, and do not understand what a true split test is. They take past data, insert it into an excel sheet, and declare a winner based on the statistical significance. While yes, the excel sheet is telling you the test has concluded – it was not run under the right conditions for the data to be valid. Disregard those stats!


A true split test means that the variables run at the same time and under the same circumstances. If you have an ad “a” running for one week, followed by ad “b” running the week after – then your split test has been compromised. The world changes quickly and the events, situations, and competitor ads that influence your experiment in week one will not necessarily still be in effect the next week. Unless the ads run at the same time and under the same circumstances you do not want to rely than those stats are invalid.



Problem Three: Stats Come From Different Sources


This is more of an issue that I see with split tests that involve landing pages, but it can affect your ads as well if you are not careful. If landing page “A” is getting different traffic sources than landing page “B” then you need to disregard those stats. Different traffic sources have a huge difference in the quality of traffic as well as the intent of the user. The results from these tests can reflect the intent and quality more than the landing pages design and copywriting. Think about the difference in intent and quality from sources such as :


  • Mobile vs. desktop traffic
  • Ad vs. organic traffic
  • Facebook (everyday users) vs. Facebook ads
  • Warm vs. cold traffic
  • Weekend vs. weekday traffic


Sometimes you are able to drill down to compare apples to apples (especially with Google Analytics), but some programs do not let you. This is why we recommend creating separate landing pages for every one of your marketing efforts. Landing page for emails, ads, Facebook, Twitter – etc. This allows you to separate and keep your data clean and get statistics that are reliable.


What is a time where stats have tripped you up rather than helped you?