# A/B Testing Explained: A Plain-English Guide

> A/B testing compares two versions of something to see which performs better, using real data instead of opinion. This guide covers forming a hypothesis, sample size, statistical significance and the pitfalls that ruin results.

*Section: Marketing — By Harper Quinn (Marketing & Growth Editor) — Published July 15, 2025 — 6 min read*

Canonical URL: https://dailyjunction.org/marketing/ab-testing-explained
Tags: a/b testing, split testing, conversion optimisation, experimentation, marketing analytics

## Key takeaways

- A/B testing shows two versions of something to different users at random and measures which produces a better result.
- Start with a clear hypothesis: a specific change, an expected effect and the metric you will judge it by.
- You need enough data — a sufficient sample size — before a result means anything, or you are just reading noise.
- Statistical significance estimates how likely a difference is real rather than chance; it is a confidence check, not proof.
- The biggest pitfalls are stopping early, testing too many things at once, and ignoring the size of the effect.

Marketing is full of confident opinions about what works — which headline, which colour, which subject line. A/B testing replaces those opinions with evidence. Instead of arguing, you run a fair experiment and let real behaviour decide. It is one of the most powerful tools in marketing, but only when done properly; done carelessly, it produces confident-sounding nonsense. This guide covers the four things that separate a real test from wishful thinking: hypotheses, sample size, significance and pitfalls.

## What it is

**A/B testing — also called split testing — is a method of comparing two versions of something by showing each to a random portion of your audience and measuring which performs better against a defined goal.** Version A is usually the existing "control"; version B is the variation with one change.

The power comes from a single design choice: because users are split *randomly* and only *one* element differs between A and B, any difference in results can be attributed to that element. Everything else — the season, your traffic mix, the day of the week — affects both versions equally and cancels out. That is what makes A/B testing a genuine experiment rather than a guess, and why it underpins serious [conversion rate optimisation](/marketing/conversion-rate-optimisation).

> The purpose of an A/B test is not to prove you were right. It is to find out what is true. Going in hoping for a particular result is the first step towards fooling yourself.

## Start with a hypothesis

A good test begins before you touch anything, with a clear **hypothesis**. A hypothesis is not "let's try a green button"; it is a specific, testable statement with three parts:

1. **The change** — what exactly you will alter (and only that).
2. **The expected effect** — what you predict will happen and, ideally, why.
3. **The metric** — the single number you will judge it by.

For example: *"Changing the call-to-action from 'Submit' to 'Get my free quote' will increase form completions, because it states the benefit and reduces hesitation."* That is testable. It names the change, the expected outcome and the metric — form completions — you will measure.

Defining the metric in advance is vital. If you decide *afterwards* which number to look at, you will always find one that flatters the result. Decide the goal first, the same discipline that makes [marketing ROI measurement](/marketing/measure-marketing-roi) trustworthy: choose what counts as success before you see the data.

## Sample size: gather enough data

Here is the rule beginners break most often: **a result means nothing until you have enough data.** A version that looks like a runaway winner after twenty visitors is almost certainly noise. To know whether a difference is real, you need a sufficient **sample size** — enough people in each group to see a stable pattern rather than random fluctuation.

How much is enough depends on two things:

| Factor | Effect on required sample |
|--------|---------------------------|
| Your current conversion rate | Lower rates need larger samples |
| The size of effect you want to detect | Smaller improvements need larger samples |

A tiny improvement is genuinely hard to detect — distinguishing a real 1% lift from random noise takes a lot of data. A big improvement reveals itself sooner. The practical move is to **estimate the required sample size before you start**, using one of the many free A/B test calculators, and then commit to running the test until you reach it. The way data behaves more reliably as samples grow is a basic statistical principle, the kind the **Office for National Statistics** relies on across its work.

## Statistical significance: is the difference real?

Once you have enough data, you need a way to judge whether the difference between A and B is **real or just chance**. That is what **statistical significance** estimates.

In plain terms, significance answers: "If there were truly no difference between these versions, how likely is it I'd see a gap this large by luck alone?" A common threshold is **95% confidence**, which corresponds to roughly a 5% chance the result is a fluke. Most testing tools calculate this for you and report it as a confidence level or **p-value**.

Two cautions keep you honest here:

- **Significance is a confidence check, not proof.** Even at 95% confidence, there is a real chance the result is wrong. It reduces the odds of fooling yourself; it does not eliminate them.
- **Significance is not the same as importance.** A result can be statistically significant yet so small it makes no practical difference to your business. Always ask *how big* the effect is, not just whether it cleared the confidence bar.

> Two questions, not one: "Is this difference likely real?" *and* "Is it big enough to care about?" A change can pass the first and fail the second.

## The pitfalls that ruin tests

Most failed A/B tests are not bad luck; they are avoidable mistakes. Watch for these:

- **Stopping early.** Ending the test the moment it looks favourable — sometimes called "peeking" — is the cardinal sin. Results swing wildly early on; an apparent winner today can reverse by next week. Set your sample size and run to it.
- **Testing too many changes at once.** If you change the headline, the image and the button together, a better result tells you *something* worked, but not what. Isolate one variable so you actually learn.
- **Running it too briefly.** A test that does not span normal cycles — weekdays versus weekends, paydays, seasonal swings — can capture an unrepresentative slice of behaviour.
- **Ignoring external events.** A sale, an outage, a press mention or a holiday can distort results. Note what else was happening during the test.
- **Chasing tiny effects on low traffic.** If a page gets fifty visits a month, you will never gather enough data to test it meaningfully. Test where the volume is.

Avoiding these is mostly about patience and honesty. The whole value of testing evaporates if you bend the process to get the answer you wanted. That evidence-led humility carries across all of marketing measurement — as CM Beyer argues in its guide to [measuring marketing ROI without overcomplicating it](https://cmbeyer.co.uk/how-to-measure-marketing-roi-without-overcomplicating-it/), a few numbers you trust and act on beat a pile of figures you quietly massage.

## The bottom line

A/B testing turns marketing arguments into experiments: split your audience at random, change one thing, and measure. Begin with a clear **hypothesis** that names the change, the expected effect and the metric; gather a large enough **sample size** before drawing conclusions; use **statistical significance** to gauge whether a difference is real, while also asking whether it is big enough to matter; and steer clear of the classic **pitfalls**, above all stopping early. Done with patience and honesty, A/B testing is how you replace opinion with evidence — and keep improving for good.

## Frequently asked questions

### What is A/B testing?

A/B testing, or split testing, is a method of comparing two versions of something — such as a web page, email or advert — by showing each to a random share of your audience and measuring which performs better against a chosen goal. Because the audience is split randomly and only one element differs, it isolates the effect of that change.

### What is statistical significance in A/B testing?

Statistical significance is a measure of how likely an observed difference between two versions is real rather than down to random chance. A common threshold is 95% confidence, meaning there is roughly a 5% probability the result is a fluke. It is a confidence check, not absolute proof, and should be paired with a sensible sample size.

### How large a sample size do I need?

It depends on your current conversion rate and how big a change you want to detect — smaller effects need larger samples. The key principle is to decide the required sample before you start and wait until you reach it, rather than stopping the moment a result looks good. Free sample-size calculators can estimate the number for you.

### Why do A/B tests give misleading results?

Usually because of avoidable mistakes: stopping the test as soon as it looks favourable, not gathering enough data, changing several things at once so you cannot tell what worked, running it too briefly to capture normal variation, or focusing on a difference too small to matter in practice.

## Sources

- [Office for National Statistics (ONS)](https://www.ons.gov.uk/)
- [Nielsen Norman Group](https://www.nngroup.com/)

---
Daily Junction — https://dailyjunction.org/marketing/ab-testing-explained