AB split test graphical Bayesian calculator

Version	Include	Trials	Successes	Apprx probability of being best	95% chance conversion rate between
A
B
C
D

What is this calculator for?

The aim in analysing split test data is sorting out

the signal on which you can act
the noise of random variation.

Most split testing tools give you some variation on significance testing to do this job.

There are a number of issues with null-hypothesis significance testing, this wikipedia article give some good examples and references.

This calculator takes a different approach, A Bayesian approach can give you a good estimate of the probability that A beats B given the data you have –which is, after all, the business question!

The plots show the probability distribution of conversion rates, given the data. The probabilities of being the most successful version, displayed in the table, are based on a random sample of several thousand points within the distribution (monte-carlo method). For experiments that are close, you will notice the probabilities may vary slightly if you re-calculate.

The calculations depend on a few assumptions. In particular it is assumed that each trial has equal probability of success, so if something else changed during your experiment, it may throw out the results (such changes would also be a problem for simple approaches to traditional significance testing too).

Why use it?

A Bayesian approach to analysis of AB tests has many important advantages compared to approaches for estimating statistical significance.

It can often enable you to draw useful inferences, even where conversion rates and sample sizes are low .

A weak signal –if that is all you have –is enough for some marketing decisions –you can make your own decisions about the level of confidence you need based on the business situation.
If you have a strong signal, the answers this calculator gives you will be the same as you get from significance testing.

Measuring conversions –not micro-conversions
One particular issue we see in significance testing for online split testing is what you choose to measure as a conversion –we ’ve had clients who were advised to measure only the immediate click-through-rate relating to their variations (the micro-conversion), rather than final conversion, because the tests would “reach significance faster ”.

We think that is highly dangerous –in optimising for a micro-conversion you can easily damage your ultimate conversion rate. The point of split testing is to improve conversion –statistical significance is, at best, a tool not an objective!

That ’s where the approach of this calculator comes into its own –extracting business meaning from weak signals such as

conversions too rare to reach significance
low traffic
optimising for a smaller segment (eg mobile)

You can use the calculator on its own, or as an adjunct and cross-check of the numbers you are getting from your split-test tool.

Reading the graph

The graphs show a probability distribution for the conversion rates of each variant.

The horizontal axis is conversion rate expressed as a percentage.
The area under the curve between any two points on the horizontal axis represents the probability that the conversion rate lies between those points.
The vertical axis shows a scale that makes the whole area under each curve integrate to 1 –so that the area represents probability.

The spread of the curve represents how precisely the experiment has measured the conversion rate.

The extent to which the areas under the curves overlap corresponds with your experiment not separating the probable conversion rates.

If the means are reasonably well separated but the curves are wide, you need more trials.
If the means are very close together and the curves are getting quite steep, there probably isn ’t much difference between A and B in terms of conversion rate.

You will see that as your number of trials and conversion increases up, the sharpness, and hopefully separation, of the peaks increases. What you are aiming to achieve is a clear signal of well separated peaks.

Assumptions and the maths

The calculation assumes that you are measuring a variable that has only two values: success and failure, and that the assumptions of a binomial distribution apply.

The posterior probability is a beta distribution .

A uniform prior probability is assumed.

Technology

The distribution is calculated and plotted using the jStat javascript statistical library .

Other References

Bayesian A/B testing with theory and code –The Technical
Random inequalities V: beta distributions John D. Cook
Book: Bayesian Statistics: An Introduction Peter M Lee. (avail Amazon UK ) –an approachable introduction and the the first dead-tree book I ’ve been compelled to buy for while!
Christopher Lee ’s Lectures on Vimeo –a great introduction

Next steps

Looking at analysis where the variable in question is not binary, for example, spend-per-customer or time-on-site