“How long will my test need to run?” is usually one of the first questions that comes up when getting started with A/B testing. The length of your test is one of the most significant factors affecting the feasibility and validity of your experiment.
In order to have a valid experiment, you will need to run your test until you achieve statistically significant results from a representative sample. However, in order for your test to be feasible, it must achieve these results in a reasonable time period. There is no sense in running a test that will take 9 months to generate meaningful results.
The required length of your test depends on a few things, including:
- The volume of traffic to your test
- The baseline conversion rate
- The expected lift
- Your desired level of significance
Ultimately, the level of statistical significance will determine whether or not the results of your test can be relied upon. However, you should be aware of the risks of ending a test too early or running a test too long.
Is my test too short?
If your test is too short, even with a statistically significant result, there may be issues with selection bias. When you run a test for less than one cycle (generally a week), you run the risk of having a sample that’s not representative due to insufficient sampling.
For example, if you have enough traffic from Monday to Thursday and you call your winner, your results don’t consider the users who come in on the weekend, which could include a different group of users.
An example of weekend users behaving differently for each variant.
You should run tests so that they cover at least one cycle of website traffic. Google Optimize recommends running a test for at least two weeks as this often covers most website traffic cycles twice.
Is my test too long?
If your test is running for over a month with no significant results, it’s a good sign that you may be better off testing a different design since you are cutting into valuable testing time without any results.
The more serious issue with keeping your test running for too long is sample pollution. The longer your test runs, it’s more likely that your sample will become non-representative due to:
- New campaigns or promotions in market
- Launching new campaigns during a test could cause your sample to be biased. If a campaign targets a certain segment, the results of the test could be skewed towards the targeted users.
- Holidays and other seasonal effects
- Holidays can have a similar effect, as the behaviour of visitors during holiday periods may not apply to other times of the year.
- Technical issues with the tool or website
- Unforeseen technical issues may cause your tool to malfunction, leading to incorrect data and results.
- Cookie deletion
- When cookies are deleted, a user may be shown two different variants which will confuse the user and cause the test to be unreliable.
So, what’s the ideal length for your test? It depends!
You should run your test long enough to mitigate the impact of a weekly cycle. 2 weeks is a good guideline if your site sees distinct differences in behaviour between weekdays and weekends. You also shouldn’t keep your test running for too long; If an experiment hasn’t found a winner after 1-2 months, it may be unlikely to ever conclude and continuing the test runs the risk of introducing other biases.
Do you need extra help running A/B tests through Google Optimize? Let us know!