hey squad
shall we settle in?

01 · CVR · 02/03

A/B testing.

Not a test without a hypothesis. Not bullshit. One version at a time, until proof.

You compare two versions of your page (original vs new) on the same window, the same traffic, the same funnel. You wait for the result to hold before calling. No premature stops, no fashion effects. You steer your roadmap on proof, not on taste.

By Rémy · Acquisition & field

Get my A/B tests quotedSee the Amecam mission

02 · THE BRIEFING

  1. When to consider it
    You have a UX audit that delivers 5 to 10 hypotheses, but you do not know which to start with. Your team argues over a button colour. Your previous redesigns did not move the conversion rate (visitors who take action). The moment instinct is no longer enough.
  2. Why it matters
    Without a test, each redesign is a bet. You ship a new landing page, you push to prod, you wait. Three months later, you still do not know if it is better or worse. The hidden cost: good ideas killed for lack of proof, and bad ones staying live by inertia.
  3. What you get back
    A quantified verdict per test, validated on a duration and volume that make the result reliable (statistically). A log of past tests, won or lost, with their real measured lift. An internal testing culture instead of the argument from authority.
  4. How we run it
    We start by reviewing what the audit surfaced, before we draft a single variant. Tools: GA4 Audiences (Google Analytics segmentation) for traffic split and conversion events, Hotjar (heatmap + in-page surveys) to watch behaviour on each variant, Microsoft Clarity (free heatmap + session replay) as backup for suspicious sessions. One hypothesis at a time, never two competing variants on the same funnel.
  5. What it unlocks
    A test pipeline that runs without debate, month after month. A product roadmap steered by data, not by opinion. An accumulation of knowledge about what really works with your visitors.

We get back to you within the week · scoping before any quote.

04 · WHAT WE WON'T WRITE IN AN RFP

An A/B test is a business decision tool. Not a design deliverable nor a benchmark.

An A/B test is only useful if you are willing to kill your favourite ideas when the data calls against them. Otherwise, you run measurement theatre. The non-negotiable rule: 14-day minimum duration, sample size calculated before launch, one hypothesis per test. If the ICP (ideal customer profile) is not validated upstream, the right order is first audit the funnel, then test.

  • 01

    Data-sourced decision

    Each call rests on a quantified verdict, not on the opinion of the last person who spoke. You validate the real lift, not the placebo effect of the first two weeks live.

  • 02

    Knowledge capitalisation

    Lost tests count as much as winners. You build a log of what works and what does not in your sector, your ICP, your key pages. Lasting heritage, not a throwaway deliverable.

  • 03

    Exit from opinion battles

    No more meetings on button colour or CTA wording. The data calls, the team moves forward. You free brain time for the calls that are actually worth a debate.

  • 04

    Accelerated product roadmap

    You know what to push live, what to kill, what to send back to reflection. No more six-month redesign cycles for an uncertain result. You move forward through sourced micro-wins.

05 · THE PLAY-BY-PLAY

Four steps per test. Four weeks on average. Zero production freeze.

  1. 01

    We pick the hypothesis that pays.

    We start from the audit backlog (5 to 10 hypotheses prioritised on impact / effort). We pick one, the one where the business lift × traffic volume potential is highest. We formulate the hypothesis in `if X then Y measured by Z` format. No opportunistic tests outside the backlog.

  2. 02

    We size before launching.

    Calculation of the sample size needed to detect the target lift at 95 % confidence. Minimum duration scoping (14 days to cover a full weekly cycle). If your traffic does not let us reach the size in 4 weeks, we revisit the hypothesis or we wait.

  3. 03

    We launch and stop touching.

    Variant B in parallel with version A, 50/50 split via GA4 (opens in a new window) Audiences (Google Analytics segmentation). No production freeze. Hotjar (opens in a new window) watches both variants in parallel, Microsoft Clarity (opens in a new window) as backup for suspicious sessions. No modification during the test window. No premature stop, even if A seems to win at 7 days.

  4. 04

    We call it, we document, we archive.

    At the end of the window, we read the quantified verdict (real lift, confidence interval, device and source segmentation). Call: push B live, kill B, or relaunch a refined variant. Versioned Notion documentation, log of won and lost tests up to date, brief for the next iteration.

06 · THE FLOW AT A GLANCE

The path each visitor takes between the traffic split and the test verdict.

TRAFFIC · 50/50 SPLITGA4 Audiences · randomised50%50%VARIANT A · CURRENTCTAVARIANT B · HYPOTHESISCTASIGNIFICANCE14d min · n sized · 95%no earlystopVERDICT · MEASURED LIFTB wins · +14% CVR95% CI · segmented device + sourceGA4 AUDIENCES · HOTJAR · PLETOR

One hypothesis at a time. The data calls at the end of the window, not before. We document the verdict, won or lost, in the test log.

07 · NOT YET FOR YOU IF

Three cases where A/B testing is not the top priority.

  • Your traffic is under 5,000 conversions a month.

    At this volume, reaching a statistically honest sample size takes 8 weeks minimum per test. Across 10 hypotheses, two years to validate half. Better to prioritise traffic and the audit first.

  • You are not ready to kill your favourite variants.

    If the final call goes back to a leader who will not read the verdict, A/B testing becomes measurement theatre. The cost of a test cycle is not paid back by an already-made decision. Better to clarify governance first.

  • Your conversion tracking is broken or fuzzy.

    Without reliable GA4 on conversion events, test verdicts rest on truncated data. You validate a false winner, you push the wrong variant live. The right order is tracking first, test second.

08 · THE QUESTIONS WE ACTUALLY HEAR

Questions whispered after the second meeting. Honest answers.

No credible market number on direct inaction cost, but the empirical rule holds: across 10 redesigns done without testing, 4 lower the conversion rate (visitors who take action) without anyone seeing it (natural noise masks the drop). The real cost is not the test you did not run, it is the wrong variant staying live by inertia. The broader agency vs freelance vs in-house debate sits here.

For SME volumes (< 1 M monthly traffic), GA4 Audiences (Google Analytics segmentation) is largely enough to split traffic 50/50 and measure lift on conversion events. VWO and Optimizely charge 200 to 1,500 € a month for features we do not use in B2B. Above one million monthly visits or for multivariate tests (3+ concurrent variants), the dedicated tool becomes relevant. For the verdict to hold, the GA4 layer must be set clean upstream.

Yes, provided the split does not change the processing purpose (same funnel, same data collection, same cookie policy). Pure A/B testing is not a GDPR topic in itself. What can become one: a variant that collects different data (extended form, custom tracking pixel). In that case, declare the purpose in the cookie policy and submit to consent.

Range: 6 to 12 tests a year for a B2B SME client, 12 to 24 for an e-commerce with volume. Pace depends on available traffic and on dev / design capacity to execute the variants afterwards. On the Amecam mission, we run around 8 tests a year, win ratio ~30 % (which is in the 25-35 % market range).

Tool side: GA4 Audiences free (included in GA4), Hotjar optional (€39/month Business plan). Steering side: 1 day a month of hypothesis scoping + verdict reading, runnable by a senior PM or an internal growth lead. If we leave, you walk out with the Notion log of past tests + hypothesis scoping procedure + account access. No lock-in. Winning variants then feed your editorial landing pages as the new standard.

Yes. We do this on 50 % of our A/B cycles. The rule: who steers hypothesis formulation and who ships the variants live. If your existing agency stays front-end change pilot, we frame the hypothesis, the sizing and the verdict reading. If they execute our brief, we accompany the deployment. We never step on the other agency without saying.

Field note

You test to call shots, not to reassure. Before launching, we ask why not. 14 days minimum, sample size calculated, one hypothesis at a time. If the ad does not pay its CAC (customer acquisition cost), we cut. If the variant does not lift, we kill, even if the CEO likes it. In real life, on the ground, this discipline is what wins.

Rémy · Acquisition & field
RémyAcquisition & field · HeySquad

Other sub-services CONVERSION

We open your hypothesis backlog. We tell you which ones pay, which ones fall.

Get my A/B tests quotedBack to the Conversion service