Make It Count! How To Avoid Common PPC Ad Testing Mistakes

By , Associate Director of Paid Search at Hanapin Marketing

184 SHARES

One of our favourite talking points in the PPC industry when it comes to testing is deciding when we have enough data to judge our results accurately. On the face of it, this should be a fairly straightforward process, albeit with a little bit of maths involved, however it rarely seems to be as simple as it seems.

 

For the past couple of weeks I’ve been working heavily on both ad and landing page conversion rate testing and wanted to dive into some of the things I’ve seen and a few of the pitfalls to avoid with your testing.

 

Let’s get started by looking at an overly simplistic example:

 

Ad 1: 1 click, 1 impression, 100% CTR

Ad 2: 0 clicks, 1 impression, 0% CTR

 

By the numbers Ad 1 has this test in the bag – 100% vs 0% CTR. However, neither you nor I are foolish enough to assume that this proves ‘ad 1′ is doing better. That is because we know that we don’t have a statistically significant amount of data yet. Most of us in PPC are well aware of this fact, but what we aren’t always so great at is knowing at what point we can conclude an ad test.

 

Before we go any further I also want to point to this article which discusses how statistical validity can lead you astray (probably a good pre-cursor read before accepting everything I say as gospel).

 

Too soon?

The crux of this blog post is to convince you not to end your testing too soon. I want to highlight a graph from some recent CRO (landing page) testing I’ve been conducting.

 

CRO Test

In this experiment I have 4 variations of landing pages:

  • The original
  • One with extra ‘Order Now’ buttons
  • One with the same ‘Order Now’ messaging but as text links rather than image buttons
  • One with ‘Free Trial’ text links

 

If you can make out the graph, you can see that for the first few days the ‘Free Trial’ variation blew the competition away -> it looked like a conclusive win for one of my new pages with future untold benefits to our PPC campaigns to be reaped. Unfortunately after a couple more days I came crashing down from my exuberant position in the clouds – this was a statistical anomaly and in actual fact, none of my new tests were better than the once languishing original.

 

After seeing this I thought I’d better re-investigate some of my recent Ad Copy testing to make sure I wasn’t guilty of making changes with the same kind of false hubris I experienced with my CRO testing.

 

If you’d like to learn more about the maths behind A/B testing, Visual Website Optimizer wrote a good piece about it on their blog. Essentially what you want to be able to say is that you are more than 90% confident that your test ad either won or lost an experiment. There are a ton of great resources online – I like to use one of these normally:

 

 

Statistically significant

abtester.com’s calculator

 

What I like about the one pictured above is that I can easily see across multiple sets of ad copy the level of confidence that it will beat my control copy. In this case I can almost certainly pause Treatment 2, and could declare Treatment 1 a winner if I’m happy to accept 90% statistical significance. However, despite these handy tools there are a couple of things to watch out for…

 

Which metric are you looking at?

One major problem with testing is deciding what your key metric is. In most cases you are going to go with conversion rate or click-through rate. These are the two options that Google allows you to automatically adjust for in your ad targeting settings too. My preference is looking at Impressions Until Conversion as it takes both of these into account.

 

AB Testing or letting Google?

When Google tried to force us to abandon the ‘Rotate Evenly’ setting for our ads it elicited outrage in the community. The reason being that a lot of people 1) don’t really trust that Google gives enough time to each ad when optimizing, and 2) everyone has different standards for judging their tests to be completed. If you aren’t really a maths whiz, you might be best off letting Google do the heavy lifting (find ad rotation options in your campaign settings tab), but I’d recommend you have a go at doing it yourself first.

 

Looking over the right amount of data?

One of the big problems I always have with Google’s auto-optimizing is that I don’t trust it has enough data (it starts to optimize towards ads with 1 conversion vs 0 over 10 clicks it feels like). The calculators I linked to above will tell you how large of a sample size you need to conclude your experiment. Make sure that you don’t conclude even a statistically significant test with too small of a sample – much link the linked CRO test above – this can be dangerous.

 

Going by ad group or bringing data together? Which is better anyway?

If you like pivot tables in Excel, you have the option to conclude your ad tests by pooling a lot of data over multiple ad groups and campaigns. Instead of judging each test on an ad group level you can pivot all that data (those data?) and say that across your entire account ad copy B was better than ad copy A or C. See this post from Eric for more information on how to do this.

 

Pros of aggregating your testing: Way more data (easier to get statistical significance), reduces chances of outlier keywords effecting your overall plan for account ad copy, makes it easier to keep your ads nicely ordered within your account (Just 1 or 2 winning sets running).

 

Cons: Your ad groups are carefully segmented – so it stands to reason that your ad messaging will perform differently in each ad group. Pooling the data only serves to reduce down the granular testing you were running.

 

The best approach to have when it comes to this testing is:

  • Get enough data
  • Don’t rush to judge your experiments a success
  • Record/aggregate your results in a sensible way

 

As always, if you have any questions related to testing within PPC, or any tips to share with the community, feel free to share them in the comments below!

Get more weekly links with our Fast Five newsletter! Five Fast Links in Your Email Every Friday.

Also send me a daily RSS digest

Automation Toolkit

Twitter Facebook LinkedIn Google+ Email Print More
  • Stephen

    Sam, really appreciate the post. Got a question for you
    I recently found out that Bing does not guarantee a 50-50 split in an ad test. Thus, when I ran my most recent one, my treatment has almost 30% less impressions than my control. Now, I’m using Chad Summerhill’s Ad Testing excel sheet, and it says that despite the vast difference in impressions, my results are still statistically significant. How valid do you think those results are?
    Thanks for all the help you guys provide. Seriously, we appreciate it.

    • Sam Owen

      Sending a smaller proportion of traffic is a valid test – as long as you have significant data levels, which your Excel sheet says you do.

      It’s pretty tough to ever get an exact 50-50 split – the only thing to look out for is that your two ads aren’t showing in wildly different positions as that could affect your test.

  • Tarek

    Hi Sam, I’m currently working on a few campaigns where I’m trying to optimize for conversions.

    Some campaigns are generating few conversions but very high CPA.

    These campaigns have been running for a couple of weeks, shall I give it more time and optimization?
    If yes, what variables do you suggest I test?

    • Sam Owen

      Knowing when to give up on poor converting campaigns can be a tricky thing to work out. I’d say it depends on a couple of factors: 1. How many keywords is your campaign spread over? If lots, you’ll need more time to rule out all of those keywords as poor – if few, you can do this sooner (after a few hundred clicks per KW). Try using the Google Analytics metrics to weed out keywords with poor Bounce Rates/Time on Site as a secondary options.