Growth Marketing

Why You Rarely See That Amazing Conversion Lift Your Testing Tool Promised

 

That is the look of disappointment. Disappointment at what your testing tool is reporting — versus what you’re actually seeing in your business metrics.

Neil Patel wrote a useful post yesterday on lessons he’d learned after dropping $252K on Conversion Rate Optimization (CRO). It’s a very worthwhile read (whether you have that kind of coin to spend on CRO or not).

I particularly like this excerpt from Neil’s Lesson #10:

Over the last 1.5 years I’ve noticed a trend that just because a test says it increases your revenue by 30%, doesn’t mean it will maintain that increase in the long run. I am not 100% sure why, and nor are the consultants I work with, as they have seen this happen too.

Even with statistically significant tests, those 30% revenue lifts tend to be 15% lifts in the long run. My best guess is that there are other variables that come into play, such as the quality or volume of your traffic changing over time.

As a conversion expert, I’d be happy to share what I’ve observed around this mysterious phenomenon.

Your testing tool – and your interpretation of its data – is setting false expectations for that winning test. The tool isn’t lying to you – you’re just lying to yourself. So stop it. 🙂

Just because you have a statistically meaningful result (let’s say, a 95% level of confidence) on a juicy 25% conversion rate lift does NOT mean that lift will hold true forever, and certainly not for an entire year – and perhaps not even for the weeks immediately following your big test.

It only suggests that all things remaining constant, you will achieve the same result in 19 out of 20 repeated tests. All things remaining constant (except for the variable you’re testing, of course) is the key.

Back to grade school

Did you ever conduct a plant growth experiment in grade school? The kind where you control the plant’s environment and change only a single variable – such as the amount of water you provide, or the type of soil used? This is essentially what you’re doing with a split or A/B test on the Web.

At the end of your 2- or 3-week plant experiment, you may have come to the conclusion that more water results in more growth of your plant. But imagine that you ran the experiment in December, when sunlight was at its weakest strength and lowest duration (in the northern hemisphere, anyway). Would you be able to say with confidence that you would achieve exactly the same outcome in June?

In fact, you may not have considered controlling for the season, because the duration of your experiment was so short.

Could something similar be happening with your Web experiments?

If you thought that an A/B test could control for all variables across all time periods, you would be mistaken. Sure, testing tools will control for all variables that could influence your test results during the time your test is live, but once you stop a test, all bets are off. If you run the very same test in a month’s time, you actually have a brand new test — because you haven’t accounted for changes (seasonal or otherwise) in your traffic over time.

And here’s when the problem rears its ugly head…

If your site’s conversion rate is relatively constant throughout the year, you may not share Neil’s problem. But if your primary conversion rate does fluctuate, it’s likely that you won’t be able to count on the lift that is being reported in your tool! Neil has gone so far as to say that he just accepts it as part of his testing program (as sometimes it’s easier to accept an observed behavior than try to explain it).

What could be causing this frustrating effect?

From my own testing experiences, the mysterious variable Neil describes is motivation. More specifically, changes in the motivation of your site’s traffic. Think of it as shifting sunlight across the seasons…

Let me share a rather surprising example:

 

You’re looking at a representation of the weekly conversion rate for Intuit’s TurboTax site, between December and the middle of April – minus the actual numbers, of course. Can you believe how much the conversion rate varies? That tax deadline for US residents is very motivating!

In 2010, we ran an A/B test in the first few weeks of tax season on the TurboTax sign-in page – giving people 5 persuasive reasons why they should choose TurboTax over its competitors – and we achieved a respectable conversion lift of 23%. We ran the same test in late March… exactly the same test… and (ack!!!) we saw a measly 6% lift.

The following year we ran a similar test to convince people to choose TurboTax, but this time we let the test continue to run. Instead of stopping the test when we reached statistical significance, we let it run for 4 months straight.

Guess what happened. The % lift in conversion not only fluctuated, but it tended to move in the opposite direction of the site’s conversion rate. The lower the conversion rate, the higher the lift. As conversion rate on the site increased, the % lift dropped. Huh?

We postulated that as people’s motivation to complete their taxes increased (thus driving up conversion rates every April), our persuasive messaging had less of an impact (and therefore generated less conversion lift).

Think about it… if you had waited until April 14th to finish your taxes, you’d probably walk over hot coals and put up with the worst user experience just to complete your goal of submitting your taxes (Joanna Wiebe of Copy Hackers fame and I call this the Hot Coal Effect). You don’t need to be persuaded at that point.

However, earlier in the tax season when people are looking around for a tax preparation solution, they can absolutely be persuaded with the right messaging!

So depending on when we ran a test – and how that timing related to visitors’ motivations – we would see very different test results. And if we had told our executive leaders to expect a 23% lift in conversion through the year, we’d have been dead wrong.

For many businesses, motivation and conversion rates vary widely like they do for TurboTax.

Amazon’s conversion rate seriously fluctuates between Black Friday and the end of December. It’s simply a reflection of changes in visitor motivation relating to the Christmas shopping season. I mean how motivated are you by December 20th if you haven’t purchased your loved one’s gift? You’ll walk over hot coals…

And we observed the same effect when we ran tests during product promotions. The impact of the product promotion was higher motivation and conversion — resulting in lower conversion lift for our tests.

I would also submit that Neil and many others are noticing this effect because of selection bias. When you get a huge lift for a test, you notice when the money doesn’t come rolling in. But when you get a minor lift to begin with, you probably won’t notice a dip – and an increase in projected results will likely be attributed to other marketing efforts (unless you’re already aware of this phenomenon)!

So let me ask you… When are you running tests and how does your traffic’s motivation change throughout the year?

One big question comes up a lot when I talk about this stuff to clients: Why would Intuit let a test run for 4 months when we could’ve been profiting from the incremental lift generated by the new recipe? Or in other words, why wouldn’t we run 100% of our traffic to the higher-converting experience?

Because we wanted to learn. We needed an answer to the question that Neil poses, the answer to which would shape our entire testing program for years to come. So it was well worth holding off on ending the test when the tool told us to.

If you’re wondering where your big conversion lift disappeared to, check your Web analytics for fluctuations in your conversion rate and then overlay your experiments to determine if you tested during a high or low (relatively speaking) conversion period. From there you’ll be able to better predict how your reported lift will change throughout the year – and not feel like you’re being lied to by your testing tool. 🙂

Happy testing!

About the author

Lance Jones

Lance Jones is a CRO and SaaS growth expert.

  • Bryant Jaquez

    Excellent insight. Thanks for showing us a “behind the curtain” look at Turbo Tax. 

  • phranz

    Thanks a lot for your thoughtful post, Lance! I would also see seasonal user motivation as one of the major reasons why things after A/B-test might not turn out as expected. However, I would also always recommend segmenting results across all marketing channels (paid search, organic search, e-mail marketing, etc.). This makes the reasons for unexpected developments easier to spot and could help you to understand your users better.

    • Lance Jones

      Absolutely! I am a staunch advocate for segmenting data… in research, analytics, A/B tests, etc. I’ve been surprised more times than I can recall when I’ve dug deeper into a set of results in the way you describe. Thanks for the awesome reminder!

  • Daniel Gonzalez

    This is such a great post! 

    Lance, have you ever encountered a situation where you ran a test that got a conversion lift, and later saw (due to seasonal motivation change) a net decrease from the baseline conversion rate?

    Like, can the same landing page hurt your baseline conversion rate during a seasonal shift? —– Assuming you’ve controlled for everything except seasonality. 

    • Lance Jones

      Hey Daniel! I can’t say that I have. At Intuit, our baseline conversion rate included the “low points” on our week-over-week graph, so we never saw a dip below those lowest levels (based on any changes we had made to our site). But I can see it happening.

      You get a lift based on some unique aspect of the traffic coming to the site during a test… then you close the test and push the winning version… followed by some type of seasonal shift in your traffic that is “put off” by that previously-reported winning design… and poof, your baseline conversion rate takes a hit. Very plausible scenario. This is why it’s so important that you understand WHY a test won — so that you won’t be surprised if such a scenario erupts. Make sense?

      • Daniel Gonzalez

        Yep! Awesome, thanks.

  • Ross O’Lochlainn

    Great and timely article. 

    I just finished reading Neil’s article yesterday and I was caught off guard by his Lesson #10. It seemed baffling to me that with a statistical CL of 95% that seeing a 30% lift would hold true to some level at least.Maybe not to the full 30%, but something like 20-30%. Seeing that the revenue only increased by half of the conversion rate lift seemed like a very disheartening experience.

    I suspect we deal with a similar motivation spike since our product deals with environmental reporting. As soon as manufacturers return from their Christmas holidays in the US, they view it as time to get things in order since the government due dates for reports start rolling in from Feb-July.

    It gives us a nice spike in activity and a conversion rate lift, but it’s always been confusing because I have never known what to attribute the lift to.

    • Lance Jones

      Absolutely Ross. So let’s say you tested something on your site before Christmas and saw a nice lift… then it would be quite possible that the very same test would have generated less lift during the time following their Christmas break — meaning you would not have been able to count on what your testing tool reported. However, if you had let the test run for several months, the testing tool lift would have been much more reliable.

      Thanks for stopping by to comment!

      • Ross O’Lochlainn

        This is really an interesting concept, now that you’ve opened the can of worms!

        Does this mean that ANY result can fluctuate even after you get to a certain statistical confidence level?I know that a SCL just tells you that “A is better than B”, but how can you ever know when to actually call an end to an experiment to get an accurate number to report on an ROI of a test?

      • Lance Jones

        Yes, ANY result can fluctuate, given a new, previously unforeseen scenario — such as seasonal/shifting changes in the motivation of your visitors. But in the case of a highly consistent conversion rate, you should be able to rely on the reported lift. The reported lift (given a statistically meaningful outcome) should be sound if you’ve accounted for all possible variables, so it’s not as bad as never being able to count on the data. 🙂 You just have to understand your visitors…

Copywriting tutorial

2017 events

Business of Software
September 18 to 20 in Boston, MA
Converted
Oct 17 & 18 in Minneapolis, MN
Content Jam
November 1 & 2 in Chicago, IL

We built a million-dollar business on blogging

Amazing blog posts build businesses and print money. Now Copy Hackers is teaching indies and teams to write kick-ass posts in half the time. Get notified when we're live.

Unsubscribe anytime. 100% privacy. Powered by ConvertKit