That is the look of disappointment. Disappointment at what your testing tool is reporting — versus what you’re actually seeing in your business metrics.

Neil Patel wrote a useful post yesterday on lessons he’d learned after dropping $252K on Conversion Rate Optimization (CRO). It’s a very worthwhile read (whether you have that kind of coin to spend on CRO or not).

I particularly like this excerpt from Neil’s Lesson #10:

Over the last 1.5 years I’ve noticed a trend that just because a test says it increases your revenue by 30%, doesn’t mean it will maintain that increase in the long run. I am not 100% sure why, and nor are the consultants I work with, as they have seen this happen too.

Even with statistically significant tests, those 30% revenue lifts tend to be 15% lifts in the long run. My best guess is that there are other variables that come into play, such as the quality or volume of your traffic changing over time.

As a conversion expert, I’d be happy to share what I’ve observed around this mysterious phenomenon.

Your testing tool – and your interpretation of its data – is setting false expectations for that winning test. The tool isn’t lying to you – you’re just lying to yourself. So stop it. 🙂

Just because you have a statistically meaningful result (let’s say, a 95% level of confidence) on a juicy 25% conversion rate lift does NOT mean that lift will hold true forever, and certainly not for an entire year – and perhaps not even for the weeks immediately following your big test.

It only suggests that all things remaining constant, you will achieve the same result in 19 out of 20 repeated tests. All things remaining constant (except for the variable you’re testing, of course) is the key.

Back to grade school

Did you ever conduct a plant growth experiment in grade school? The kind where you control the plant’s environment and change only a single variable – such as the amount of water you provide, or the type of soil used? This is essentially what you’re doing with a split or A/B test on the Web.

At the end of your 2- or 3-week plant experiment, you may have come to the conclusion that more water results in more growth of your plant. But imagine that you ran the experiment in December, when sunlight was at its weakest strength and lowest duration (in the northern hemisphere, anyway). Would you be able to say with confidence that you would achieve exactly the same outcome in June?

In fact, you may not have considered controlling for the season, because the duration of your experiment was so short.

Could something similar be happening with your Web experiments?

If you thought that an A/B test could control for all variables across all time periods, you would be mistaken. Sure, testing tools will control for all variables that could influence your test results during the time your test is live, but once you stop a test, all bets are off. If you run the very same test in a month’s time, you actually have a brand new test — because you haven’t accounted for changes (seasonal or otherwise) in your traffic over time.

And here’s when the problem rears its ugly head…

If your site’s conversion rate is relatively constant throughout the year, you may not share Neil’s problem. But if your primary conversion rate does fluctuate, it’s likely that you won’t be able to count on the lift that is being reported in your tool! Neil has gone so far as to say that he just accepts it as part of his testing program (as sometimes it’s easier to accept an observed behavior than try to explain it).

What could be causing this frustrating effect?

From my own testing experiences, the mysterious variable Neil describes is motivation. More specifically, changes in the motivation of your site’s traffic. Think of it as shifting sunlight across the seasons…

Let me share a rather surprising example:

You’re looking at a representation of the weekly conversion rate for Intuit’s TurboTax site, between December and the middle of April – minus the actual numbers, of course. Can you believe how much the conversion rate varies? That tax deadline for US residents is very motivating!

In 2010, we ran an A/B test in the first few weeks of tax season on the TurboTax sign-in page – giving people 5 persuasive reasons why they should choose TurboTax over its competitors – and we achieved a respectable conversion lift of 23%. We ran the same test in late March… exactly the same test… and (ack!!!) we saw a measly 6% lift.

The following year we ran a similar test to convince people to choose TurboTax, but this time we let the test continue to run. Instead of stopping the test when we reached statistical significance, we let it run for 4 months straight.

Guess what happened. The % lift in conversion not only fluctuated, but it tended to move in the opposite direction of the site’s conversion rate. The lower the conversion rate, the higher the lift. As conversion rate on the site increased, the % lift dropped. Huh?

We postulated that as people’s motivation to complete their taxes increased (thus driving up conversion rates every April), our persuasive messaging had less of an impact (and therefore generated less conversion lift).

Think about it… if you had waited until April 14th to finish your taxes, you’d probably walk over hot coals and put up with the worst user experience just to complete your goal of submitting your taxes (Joanna Wiebe of Copy Hackers fame and I call this the Hot Coal Effect). You don’t need to be persuaded at that point.

However, earlier in the tax season when people are looking around for a tax preparation solution, they can absolutely be persuaded with the right messaging!

So depending on when we ran a test – and how that timing related to visitors’ motivations – we would see very different test results. And if we had told our executive leaders to expect a 23% lift in conversion through the year, we’d have been dead wrong.

For many businesses, motivation and conversion rates vary widely like they do for TurboTax.

Amazon’s conversion rate seriously fluctuates between Black Friday and the end of December. It’s simply a reflection of changes in visitor motivation relating to the Christmas shopping season. I mean how motivated are you by December 20th if you haven’t purchased your loved one’s gift? You’ll walk over hot coals…

And we observed the same effect when we ran tests during product promotions. The impact of the product promotion was higher motivation and conversion — resulting in lower conversion lift for our tests.

I would also submit that Neil and many others are noticing this effect because of selection bias. When you get a huge lift for a test, you notice when the money doesn’t come rolling in. But when you get a minor lift to begin with, you probably won’t notice a dip – and an increase in projected results will likely be attributed to other marketing efforts (unless you’re already aware of this phenomenon)!

So let me ask you… When are you running tests and how does your traffic’s motivation change throughout the year?

One big question comes up a lot when I talk about this stuff to clients: Why would Intuit let a test run for 4 months when we could’ve been profiting from the incremental lift generated by the new recipe? Or in other words, why wouldn’t we run 100% of our traffic to the higher-converting experience?

Because we wanted to learn. We needed an answer to the question that Neil poses, the answer to which would shape our entire testing program for years to come. So it was well worth holding off on ending the test when the tool told us to.

If you’re wondering where your big conversion lift disappeared to, check your Web analytics for fluctuations in your conversion rate and then overlay your experiments to determine if you tested during a high or low (relatively speaking) conversion period. From there you’ll be able to better predict how your reported lift will change throughout the year – and not feel like you’re being lied to by your testing tool. 🙂

Happy testing!