When And How To End Your Ad Tests

By Brad Geddes | @bgtheory | Founder, AdAlysis

It is easy to get started testing ads. Just add two ads to an ad group (hopefully by device type) and suddenly you’re testing ads. However, you won’t see long term improvements in your account until you pause your losing ads. Often determining which ads are losers and when to end ad tests is where the tricky part of testing comes into play. In this article, we’ll examine how to end your ad tests.


Note: We’re going to talk about data and metrics throughout this article. In order to make sure that words like data and metrics are not confusing, the word ‘data’ refers to a single piece of information, such as impressions, conversions, etc. The word ‘metric’ is used when two pieces of data come together to create another piece of data, such as CTR, conversion rate, etc. Data is your base information and metrics are what we’re going to use to determine winning and losing ads.


Determine Your Testing Metrics


The first step is to determine what metric is most important to you in testing.  You can’t decide how much minimum data you need until you know what metrics you’re going to use to determine winners and losers. For instance, the metric “Conversion Rate” is a ratio of clicks and conversions. Since impressions aren’t used in that metric calculation; then you don’t need to determine minimum impressions before you end a test, you would only need to determine minimum conversions and optionally, clicks.


The most common ad testing metrics are:


  • CTR
  • Conversion rate
  • CPA
  • CPI (conversion per impression)
  • ROAS
  • RPI (revenue per impression)


Once you have determined which metric you are going to use to pick winners, then you need to define your minimum data.


Define Minimum Data


Minimum data is the data that each ad in a test must achieve before you even examine an ad test to decide if you have a winning ad.


As each testing metric is comprised of different data types; first examine the chart and determine what minimum data you need to define based upon your testing criteria:


Metric Impressions Clicks Conversions Timeframe
CTR Yes Optional Yes
CPA Yes Yes
Conversion Rate Optional Yes Yes
CPI Yes Optional Yes
ROAS Yes Yes
RPI Yes Optional Yes




The only data criteria that every metric should utilize is a timeframe. You should never start a test on day 1 and end it on day 2 regardless of how much minimum data you achieve. As weekday searches, weekend searches, weekday searches at 9am, weekday searches over lunch hour, etc are different, if you don’t have a large enough timeframe then you’ll make poor decisions as you won’t have aggregated enough time in your results.  We suggest always using at least a week of information for a test, however, you can let a test run longer if you want, and in many cases you will have to let it run longer to hit all of your minimum data.


Minimum Data Suggestions


Suggesting minimum data is very difficult as you never know an account size and how much data it aggregates on a daily basis and theoretically, you’d want to suggest minimums by how much traffic any one ad group receives.


For instance, here’s some basic minimum guidelines you can use:


Impressions Clicks Conversions
Low traffic 300 300 7
Middle traffic 750 500 13
High traffic 1000 1000 20


However, please adjust these numbers based upon your traffic size. If your ad group receives 1,000 conversions/day; then you probably want at least 7,000 – 10,000 conversions, and possibly more, before you ever pick a winner.


The main rule to follow is that if one person can make a large difference in your data, then you don’t have enough. For instance, if you have 1 click on 100 impressions then you have a 1% CTR. If the 101st impression also clicks on your ad, then your CTR jumps to 1.98%, almost doubling your CTR. In that case, you should not be picking winners based upon CTR as each person who clicks on your ads is causing your metrics to fluctuate wildly.


In addition, you want each ad in a test to hit the minimum data before you progress in your testing. If you are testing by CPI and one ad has 300 impressions and the other has 1,000 and you’ve decided that 1,000 is your minimum number, then your test is not over as only one ad has reached your minimums.


Once your ad tests have reached your minimum data amounts, then you can progress to the next step: determining statistical confidence.


Confidence Factors For Ad Tests


Without getting into too much math, your confidence factors determine how confident you are, based upon your data, that a result is statistically better or worse than another result.


For instance, if you have a 15% confidence that one test is better than another one, that’s a very low confidence and you would not want to make decisions based upon such low confidence. For ad testing, you generally want at least a 90% confidence that a test is a winner or loser before making changes to your ads.


However, not all your tests need to adhere to the same criteria. For instance, if you have a strong brand, then you should strive for 99% confidence in your ad tests before making changes. 99% is a very high confidence and often requires a lot of data, something that long tail ad groups can’t achieve. So you might only strive for a 90% confidence in your long tail ad groups.


Here’s some basic suggestions for your confidence factors based upon word type:

  • Brand terms: 99%
  • Top keywords or 3rd party brands: 95% or 99%
  • Mid-data terms: 90% or 95%
  • Long tail keywords: 90%


With confidence factors, there’s an assumption that the data that comes after the test will be similar to the data that came during the test. In search, this isn’t always true as weekend, weekday, etc searches are different. Therefore, be very careful of online confidence calculators as they often don’t have the notion of minimum data.


For instance, in this example, the test has a total of 97 impressions among 3 ads. However, by a pure mathematical standpoint, there is a 97% confidence that ad 2 is the winner:


Ad Impressions Clicks CTR Confidence
Control 40 1 2.5%
Ad 2 33 5 15.15% 97.03%
Ad 3 24 0 0% 15.57%


As you’re probably thinking, 97 impressions is too low to make a decision – and you’re right. However, many confidence calculations don’t take the notion of minimum data into account, which is why you should define your minimums before even running confidence calculations and only run the calculations if your tests are at or above your minimums.


As running confidence factors for all your tests is quite time consuming, you can use scripts or software like AdAlysis that will do this automatically for you. If you want to do it by hand, you can use Excel or an online calculator.


If you are using scripts or software, then your data will automatically be calculated every day so you can only focus on tests with results. If you’re doing it by hand, then I’d suggest only running these calculations weekly or every other week as most of your tests won’t have results and it can be frustrating to do a lot of math only to determine you need to wait longer to get results.


Maximum Data


Not all of your ad tests will have statistically significant results. You will have some ad tests where the metrics are so close, you don’t have a winner or loser. These are often the easiest ad groups to ignore as you know your testing and you just don’t have results yet so you can keep the test running until you get results. This is a mistake as it’s possible for a test to run for years without any actual results.


The ‘no winner’ scenario is what maximum data addresses. With maximum data, you’re saying that if the ads hit this data criteria and there’s no winner, then you’re still going to stop the test. In this case, you can just pick your favorite ad and then write a new one that’s very different from what’s been tried before.


With maximum data, there’s really two options for defining the data criteria:


  1. Define a max number for all of your minimum data metrics
    • For example, if you are testing by CTR, then you’ll define max clicks, impressions, and a timeframe
  2. Define a max timeframe
    • Tracking and defining max data is a lot of additional work, so many people will just use a 3 month timeframe and if a test doesn’t have a result in 90 days, then they’ll end the test and start over


When To End Your Ad Tests?


Ending ad tests is actually quite simple once you have a framework in place. The first steps to take are defining your testing criteria:

  • Determine how you’ll pick winners
  • Based upon the winning metrics, determine your minimum data
  • Choose your maximum data
  • Choose your confidence level


Then, when ads are above your minimum data, run statistical significance calculations:

  • If your confidence is higher than your defined minimums, then pause the losing ad
  • If your test exceeds your maximum data and does not have results, then restart the ad test
  • If you are below your maximum data and don’t have a high enough confidence in the test results, then let the test continue to run


Automating Ad Testing


I’m a firm believer that if you do something twice by hand, then the third time you do the same task – it should be automated.


Writing ads takes creativity and human brain power.


Determining where you are testing, examining only results above minimum data, finding tests above max data, statistically significant calculations and so forth a computer can do automatically – it’s a waste of time for a person to do on an ongoing basis – there are better things to do with your time.


Therefore, we created a system that will automatically do everything for you to test ads properly and spend your time wisely. If you’d like to learn more, please take a look at what AdAlysis has to offer.