Split your players across two or more variants of a behavior, ship, and let Playloop tell you which one played better. You define the variants and how to split traffic; Playloop assigns each player, measures retention, engagement, crashes, and friction per variant, and writes a plain-English summary of what actually changed. No bucketing code in your game.
An experiment is a measured rollout of two or more variantsof some behavior: a new tutorial, a different HUD layout, a tweaked difficulty curve. You create it on the Experiments tab of any game, give each variant a key, and decide how to split traffic. Once it's running, your game asks Playloop which variant a player should see and renders accordingly.
Assignment happens on Playloop's servers, not in your game, so every SDK (Unity, Unreal, Godot, Python, TypeScript) agrees on the same answer for the same player. The SDK fetches the player's variant map once per session, caches it, and tags every event it sends with the variants in effect. That tagging is what lets the dashboard split all of your existing metrics, retention, engagement, crashes, friction clusters, feedback quotes, by variant.
The structured comparison is free on every tier. The optional AI digest, a written read on what changed between variants, uses your managed-AI session quota (or your own AI key). See Cost & quota.
From a game's Experiments tab, click New experiment, name it, and define its pieces:
Between two and eight variants. Each has a key (letters, digits, _ and -, up to 40 characters, e.g. control or tutorial-v2), a display name, and an allocation. The key is what your game passes to read the assignment, and what every event gets tagged with, so pick something stable and readable. The first variant you declare is the baseline; every other variant is compared against it.
An allocation is the relative weight of a variant: how much of your traffic lands on it. They don't have to add up to 100; [50, 30]splits traffic 62.5% / 37.5% by relative weight. If your weights don't sum to 100, Playloop shows a confirmation before you start (“your allocations sum to N; we'll normalize to 100”) so the split is never silently rewritten behind your back.
By default an experiment applies to everyone in the game. Attach an audienceto scope it to a subset, for example “players on build 1.4.0 and up.” A player who doesn't match the audience is skipped for that experiment entirely (your game falls back to its default behavior). Audiences are reusable: build one once and point several experiments at it.
Assignment is keyed on the player's device id, the same stable, anonymous identifier the SDK already uses to group a returning tester's sessions. The same device always lands on the same variant for the life of an experiment, so a player's experience stays consistent across sessions. Different experiments are independent: being in the treatment arm of one says nothing about which arm you land on in another.
For QA, you can pin a specific device to a specific variant from the dashboard, which bypasses the split entirely. That's how you force yourself onto the variant you want to test.
Because assignment is per-device, two cases are worth knowing about when you read the numbers:
QA pins are also per-device: a tester who wants the same variant on several machines sets up one pin per device.
An experiment moves through three statuses:
The moment an experiment leaves draft, its variants, allocations, and audience are locked. This keeps the comparison statistically honest: you can't re-weight the split or change who's eligible halfway through and still trust the side-by-side numbers. Names and descriptions stay editable throughout. To run a fresh version with different config, clone the experiment and start the new one.
The detail page shows one card per variant, side by side. Each card carries:
retained / eligible denominator (see cohort-eligible below).The baseline variant is badged as such and shows no confidence labels (there's nothing to compare it against). With three or more variants, a headline at the top names the strongest non-baseline variant by retention and its top-line label.
Next to each retention window and the crash rate on a non-baseline variant, Playloop shows a one-word label that says how much to trust the difference. The labels come from a Bayesian comparison of the two variants; you don't configure anything:
| Label | What it means |
|---|---|
insufficient | Not enough data yet. A window stays here until at least 15 players are eligible on both the variant and the baseline. |
inconclusive | There is data, but the two variants look too close to call. The most likely direction is below roughly 65% probability. |
promising | One variant is leading at roughly 65 to 85% probability. Worth watching, not yet worth shipping on. |
likely | A stronger lead, roughly 85 to 95% probability. The direction is fairly trustworthy. |
clear | At least 95% probability AND the credible range of the difference excludes zero. This is the only label that calls a real, separated result. |
The label is direction-agnostic: for retention, higher is better; for crash rate, lower is better. It always describes the strength of the difference, not which way it points. The rate itself tells you the direction.
Retention is measured only over players who've had a fair chance to come back. A player is cohort-eligiblefor the day-N window only if their first session was at least N days ago. You can't know whether someone who arrived yesterday will return on day 7, so they're left out of the day-7 denominator until enough time passes.
That's why each retention figure shows retained / eligible: the eligible count is usually smaller than the variant's total players, and it grows as the experiment runs. Early on, a window can read insufficient simply because not enough players are eligible yet, even if the point estimate looks decisive. Give it time.
Below the comparison cards, the AI digest turns the numbers into a short narrative: a recommendation, a per-variant headline with a sentiment read, the themes the variants share, and how sentiment shifted between them. It's the “so what?” layer on top of the raw stats, what the change did to how players experienced your game, not just which number moved.
Generate or refresh it with the Regenerate button. Until you do, the structured comparison cards above are fully usable on their own. The digest is an addition, not a gate.
The digest will never claim more certainty than the data supports. Its language is tied to the confidence label: when a result is only promising or inconclusive, the digest won't call a “winner” or say one variant is “clearly” better. If a draft ever overreaches, Playloop rewrites it to match the evidence, erring toward underclaiming rather than overclaiming.
It also won't invent quotes or themes. Every cluster the digest cites and every verbatim quote it includes is checked against your real session data first. If that check can't be satisfied, Playloop shows the structured stats only rather than surface unverifiable prose.
Building the experiment, splitting players, and reading the structured comparison are all free, on every tier, with no cap. The only part that costs anything is the AI digest.
Each time you generate or regenerate a digest, it counts as one session-equivalent against your monthly managed-AI quota, the same currency a session analysis uses. One number to think about, not two.
The structured comparison never counts against any quota. Only the written digest does.
By default the digest only refreshes when you click Regenerate, so you're never surprised by managed-AI usage you didn't ask for. If you'd rather keep it current automatically, you can opt a game in to a fixed refresh cadence:
The cadence is a fixed list on purpose: it keeps the cost story predictable. Each automatic refresh counts the same one session-equivalent as a manual one and obeys the same once-per-hour limit, so picking a cadence is the same as agreeing to that many session-equivalents per running experiment over time.
Each SDK exposes a one-call lookup for the player's assigned variant. See your engine's install page under SDKs.
The HTTP contract the SDKs use to fetch a player's variant map lives in the API reference.
These docs are evolving. Playloop is in active development ahead of launch, so APIs and details may change as we polish.