Multivariate Testing Panel
We’re switching things up a bit this afternoon. Instead of the Web Analytics: Road to Marketing Optimization panel we’ll be covering the session on Multivariate Testing. And it’s not just because they’re giving away free Google T-shirts over here. Well…not entirely, anyway.
Up we have Jon Diorio (Google) moderating speakers Andrew Anderson (CNET) [Andy Anderson? Like in How to Lose A Guy In 10 Day!], Tim Ash (SiteTurners.com), Matt Conahan (StubHub) and David Rogers (Red Envelope).
Tim Ash is taking pictures of the audience and blinding me. Thanks, Tim. Trying to write some witty banter here. Jon is now up and saying that he doesn’t appreciate Jim Sterne delaying happy hour by putting a session in at 5pm. Personally, I’m still trying to find someone to go to dinner with me. I can has friends?
If you’re considering your first multivariate testing experiment on your Web site, how do you decide where to get started?
David says there are two pieces – there’s getting buy-in and then getting started for real. You get buy-in by starting with an argument. If you give me X amount of time, I’ll make you X amount of money. Or, let me invest this much energy and I’ll return this much money. Focus on results and the changes you’re going to make and how it’s going to hit the bottom line. The second argument you need to make is that smaller changes will impact the bottom line. Give attention to that.
If you google “multivariate testing” you’ll find a lot of ideas on how to get started. Don’t involve everyone up front. If you don’t need to start with creative, don’t. They’re going to hold you up. IT has to be involved somehow. Involve one or two people who will spearhead you. Once you get going, then you get everyone involved. You don’t want people holding up the process.
He talks about a case study with Red Envelope. They started multivariate testing about two months ago. Tim Ash came on board and told them to focus on their product page. They focused on 8 different variables – they added a new page section order, new page title, presented the price in different ways, and did a lot with the submit button (changed shape, text and color). Three of the eight changes worked really well. It helped him make the argument. They had a 4.4 increase in “add to cart” conversions. That goes right to someone’s bottom line.
He involved a lot of the thought leaders and skeptics in the organization. They used Google Optimizer; it’s a great way to get started from a cost point of view. When he presented results, he annualized it. Put a number in front of people.
Matt says the alphabet starts with A/B and so should you. Hee! Why A/B? It’s simple. People get it. When you get into multivariate testing, there are a lot of different factors. Multivariate testing is better, but when you’re first starting go with A/B. At the end of an A/B test you may not know what provided lift, but it will give you a stepping stone to the more complex stuff. You can use it to get more money to do advanced things. Before you start any test, you must have agreement on what the measurements of success are.
Is there any an occasion where it’s appropriate to begin with multivariate testing?
David: If the people around you understand what’s going on, sure. The reason you don’t start with multivariate testing is an education thing. If people are willing to run with it, then you don’t have to start with A/B.
Matt agrees with David. If people on your staff “get it” and you don’t have to explain what the word ‘multivariate’ means, then do it.
Tim: You should do A/B split testing first. It’s a lot easier to do, and what matters at the end of the day is the result, not the method you used to get the result. Until you’ve exhausted the possibilities of A/B split testing, keep doing that.
What are the landmines/pitfalls that people should look out for?
Andrew: Know your tools. Not only your own systems, but the difference between A/B testing and multivariate testing. Multivariate testing will you help you reach an end point faster, but if you’re talking about being able to transition ideas from one page to the next, A/B testing is an amazing tool that leads to learning when done right. It shows you what matters and what doesn’t. You have to have buy-in. You have to build this case about what this test is going to gain them in the end. When you’re doing a full testing series, being able to take the results from one part of it and use it to feed the machine is valuable.
[I must be typing really fast and making a lot of noise because people are starting to stare at me.]
Tim, can you explain the difference between full-factorial and partial-factorial testing.
Say, you’re testing four things on a page and each thing has two different versions. If you’re doing full-factorial, it means you’ll show all 16 versions evenly. If you’re doing partial, you’ll only test some of them and predict how the others would behave. Once you do partial factorial, you can’t analyze the data in a complete way. Collect the data in full-factorial fashion. And if you do that, you’ll get better estimates with no loss of data collection.
Won’t you end up with a lot of permutations?
Tim: Yes, but you’ve already designed all those permutations. To show a new version of that page is free. Don’t worry about the number of different versions in your test. It takes the same amount of time. If you’re not sure what you’re running, ask your vendor.
Tim: You’re building a model here. With every multivariate test you have to do a split AB test afterwards because it’s just a model. It could be wrong. You have to do a follow up test. That’s where you find your actual lift.
We are going to start A/B testing soon. Is there a particular amount of traffic that you want to test? How small can my sample be?
Andrew: One of the methods CNET uses is constant iterative testing. They’ll start a test and look at data instantly. If they can use that method to cut out options, the cost goes down. It gives you a faster way to reach a point where you can gather information. You’re just tracking trends.
Tim: There are two knobs you can turn. How big of an effect do you want to find and how confident do you want to be of that answer? If you’re not looking for massive effects, then you don’t need a lot of data. If you do, you have to let it run long enough for it to stabilize.
What’s the panels opinion on continual testing?
David: That’s the goal. Test different parts of the site. Look at the different parts of the funnel.
Tim: In regards to a particular landing page, there is a point of diminishing return. Once you see things flattening out, go test something else.
What is the minimum period of time to test for?
Andrew: There’s a natural burn rate with some of the changes you make. It’s important that you always go back and restart old tests. Your users change so frequently and what they’re looking for changes. Once you think you have a page optimized, it doesn’t mean you’ll have it optimized forever. You can step away, but you have to come back.
Tim says he totally disagrees with Andrew. They test in one week increments so you get rid of day of week effects and time of day effects. Your tests will run several weeks, maybe a couple of months. This isn’t daytrading on PPC. Wait until you get enough confidence in the answer.
Andrew: On a content site, you’re trying to get people to engage. The ultimate engagement is about changing users’ classes and getting them to come back. So there are two ways to look at that. You have constant analytics and passive analytics. They’ll try to drive people to do some goal. They’ll do testing that goes a long time and then they’ll do iterative testing. You can use that to get to the end point much faster. You can’t take a week for every single version of a test. He tries to get give versions of a page tested in a week.
Tim disagrees and now he and Andrew are sparring. It’s fun to watch. They’re both talking really fast, especially Andrew.
David says he agrees with Tim. You have to think about time of day. People act different at lunch time. Andrew says they have that data and they look at it and take it into consideration. It’s about patterns.
It sounds like you run a lot of tests. How do you keep track of all of your learnings? And if Andrew gets hit by a bus, how do you make sure CNET doesn’t lose that information?
Andrew: We have people who keep me away from busses. They also have an ongoing Wiki, they teach teams, etc. You have leaders who are in charge and are responsible for teaching others.
Tim: The concept of learnings is pretty limited. In different context, different things are going to behave differently. A green button may work better here, but a red one will better over there. Is it green vs. red or is it the contrast that stood out? It’s not always what you think it is even if it’s really simple stuff. You’re going to have to retest a lot of the same things.
Andrew: They keep a library so they can share it with everyone. It gives them a starting point, a conversation. They can pull up what they know about color, what they know about buttons, etc. It’s a good way to let people know what’s been done and what’s worth testing.
How do you suggest dealing with the organization inertia when getting started?
Andrew: It takes everyone having this understanding that we’re all trying to get things to grow. Nine out of ten times the thing you think is going to win, won’t. It’s that thing that you don’t even know why it’s in the test that blows everything else away. Talking about your C-level management, you’re talking about ROI. Testing is such a great way to show growth.
Tim: Instead of having this butting heads thing, point it out at the mystical audience. It’s not “my idea is better than yours” it’s “let’s see what the audience thinks…”
When is it a good time to use a sample size of less than 100 percent [of their traffic]? Is there any benefit to using a smaller sample size?
Andrew: You need enough data for it to be meaningful. It’s all about pattern.