Playloop turns raw playtest events into a few paragraphs you can actually read. This page walks the whole path: how a session becomes a rollup, how rollups become an AI-readable digest, and every knob in between that shapes what the summary says about your game.
A session lands in Playloop and rides one pipeline all the way to your inbox. There are five visible stages:
POST /api/telemetry. Each batch is appended to the session’s transcript and de-duplicated by client-side event id.The two things to internalize: rollups are always written (drop patterns, splits, milestone rates), and AI is layered on top (auto, or manual, depending on the master switch). Turning AI off does not turn the dashboard off, it just stops the prose summary from regenerating.
Once you have telemetry flowing, four configuration surfaces decide what the summary says:
Each of these has a section below. The short version: the context tells the AI what kind of game it’s reading; the KPIs say what matters; the categories filter the funnel inference; and the tuning shapes how long, how detailed, and in what voice the report is.
Every setting on the game settings page (/games/[slug]/settings) is documented below. Each entry covers WHAT the setting does, WHY a studio would change it, and HOW it affects the AI output.
The three free-form game-shape inputs. Saved as columns on the games row.
All three are stitched into every AI prompt as a single ## Game contextblock at the top. The build prompt explicitly tells the model this block is “your source of truth for genre, mechanics, and what counts as normal progression for THIS game”, so a good prompt directly shapes how the AI reads your numbers.
Three independent event-name lists that drive severity weighting on the AI's punch list. Saved on the game.
{
"primary": ["wishlist_clicked", "act2_reached", "demo_completed"],
"secondary": ["pact_attempted"],
"ignore": ["session_heartbeat", "reset_progress", "settings_opened"]
}How each bucket changes the AI output:
Overlap resolution: ignore wins (the most opinionated bucket), then primary, then secondary. So if you put wishlist_clicked on both primary and ignore, ignore wins.
One important exception baked into the prompt: a segment split with ≥30pp spread surfaces even when its milestone is on the ignore list. A 60pp Japanese-vs-English divergence on settings_opened is a localization signal, not opt-action noise. Smaller spreads (10–30pp) still respect ignore.
Caps: 20 entries per list, 100 chars per entry. Strings are free-form · Playloop does not enforce that the event exists yet (you can declare priorities before you ship the events that fire them).
Per-event semantic tags saved against each event name. Edited from /games/[slug]/settings/events. Auto-classified from the event name on first sight; you override when the heuristic is wrong.
The auto-classifier checks suffixes and prefixes in order of specificity:
*_skipped / _canceled / _dismissed / _declined / _aborted → invertedis_* predicate flags → systemsession_* / client_* / entity_* / player_* prefixes → systemtheme_* / language_* / volume_* / mute_* → cosmetic*_clicked / _opened / _pressed / _toggled / reset_* / settings_* / wishlist_* / cancel_* → opt_actionWhen the dev hasn't categorized an event, the digest renderer falls back to this heuristic, so default behavior still surfaces pattern-matched opt-actions and system events in the AI's severity weighting.
The composable preset stack that decides how dense, how strict, how prescriptive, and in what voice the AI report is. Saved per game, so future knobs can land without disturbing what you've already set.
toneknob (described in “Individual knobs” below) ships set to narrative, summaries explain themselves like a friend talking, not a stats report. Flip it to analystif you’d rather read raw percentages and percentile values. Underlying analysis is identical either way; only the prose changes.Pick one of three profiles, each sets baselines for maxFindings, verbosity, strictness, AND recommendations in one click.
| Profile | maxFindings | verbosity | strictness | recommendations |
|---|---|---|---|---|
| executive | 3 | executive | conservative | off |
| balanced | 5 | standard | standard | light |
| investigative | 10 | detailed | exploratory | full |
The profile is the biggest hammer in the box. Most studios pick a profile and never touch the individual knobs.
A preset that shifts four numeric thresholds for how much evidence is required before a signal surfaces. Set independently of the profile, OR derived from the profile.
| Strictness | segmentMinSessions | segmentSpreadFloorPp | cliffDropFloorPp | cliffConditionalFloor |
|---|---|---|---|---|
| conservative | 50 | 15 | 15 | 0.90 |
| standard | 30 | 10 | 10 | 0.85 |
| exploratory | 15 | 5 | 5 | 0.75 |
Conservative: only the highest-signal findings survive. Exploratory: surface weak signals too. Standard is the same as the built-in defaults.
Every preset can be overridden by an explicit knob. Useful when 90% of the preset is right but one threshold needs tightening.
The composable stack collapses in a strict order wherever tuning is read:
1. Explicit knob (you set it directly) ↓ if unset 2. Strictness preset (you picked a strictness OR a profile that implies one) ↓ if unset 3. Analysis profile (you picked a profile) ↓ if unset 4. Built-in default
Profile picks a strictness baseline. Strictness sets the four numeric thresholds. Any explicit knob still wins. So you can pick investigative and then turn cliffDropFloorPp back up to 15 if you want exploratory findings everywhere except drop patterns.
All numeric knobs are clamped server-side to their range, out-of-range values are clamped, not rejected.
The master switch for AI cost on a game. Edited from the settings page; autosaves on toggle.
The toggle is the cheapest safety net in the system. If you want to set up a game and ship telemetry without spending a cent on AI until you've decided it's working, leave it off and use the manual regenerate buttons.
Naively loading every session into memory to compute build / user / game stats stops working at indie scale (one popular build is ~50 sessions; loading them for every dashboard render is 50× the cost it needs to be). Playloop replaces that with five incremental rollup levels, refreshed in the background as new sessions land.
The day key is UTC. Sessions are attributed to the UTC day their tail event landed on, so a session that started 23:00 UTC and ran past midnight gets bucketed to the day it ended. The grain is fixed at one day, there is no per-hour or per-week variant. Charts that need a finer grain would have to re-scan sessions, which is what this layer exists to avoid.
Playloop refreshes rollups in the background every few minutes. Each pass:
(game, environment) pairs that have at least one session newer than the last refresh. Cheap when nothing changed.A build-summary regeneration triggers a fresh rollup pass for that game first, so a build that landed between background ticks still gets a current rollup. The pass no-ops when nothing has moved, so this is cheap.
When Playloop adds a new field to a rollup (a new segment dimension, a new flag aggregation), past summaries stay pinned to the data shape they were generated from, no silent regressions where the same prompt suddenly produces a different answer.
The game digest carries a cross-build trends list, one entry per insight cluster, each tagged improving / worsening / flat. The classifier splits your game’s builds in half by first-seen time, older half is the prior window, newer half is the recent window, and sums each cluster’s negative-leaning fire count in each half.
The direction rule is a 2× ratio threshold on those two sums (plus floor-handling for zeros):
prior > 0, recent = 0), ORrecent / prior < 0.5 (more than halved).prior = 0, recent > 0), ORrecent / prior > 2.0 (more than doubled).[0.5, 2.0] band.A minimum-evidence gate filters noise: a cluster needs at least 3 total fires across the two halves to be considered at all. The final list is sorted worsening → improving → flat and capped at the top 10 entries by combined fire count, so the game-summary digest leads with regressions.
A drop pattern is a pair of session-end milestones where most players who reach the second one also reached the first, but a meaningful chunk of players who reached the first never made it to the second. It's the closest thing Playloop has to a causal "drop-off point" signal, computed code-side from milestone co-occurrence, not inferred by the AI.
entered_game and 64% reached act2_reached, that's a 33pp drop. If P(entered_game | act2_reached) = 99% (almost everyone who reached act2 also reached entered_game first), the drop pattern fires and surfaces in the build digest.For a (predecessor, successor) pair to register as a drop pattern, all three must be true:
predFreq - succFreq) must exceed this floor. Default 10pp.is_demo (true for nearly every session) would otherwise be trivial predecessors of everything. Default 0.99.On top of the numeric thresholds, the drop-pattern inference filters by event category. Only progression and unknown events are eligible as either predecessor or successor. That's why opt-actions (reset_progress), system events (session_heartbeat), cosmetic toggles (theme_changed), and inverted outcomes (cinematic_skipped) never become noise drop patterns.
The successor side also blocks known noise tails (matching the categorizer’s inverted tag heuristics), and the drop-pattern list is de-duplicated for transitive edges: if A → B and A → C both exist and B → C also exists, A → C is dropped because B is the tighter predecessor.
The rendered drop patterns section is sorted largest-drop-first and capped at ~12 entries per build digest.
A segment split is a per-cohort divergence on a milestone rate, for example, the proportion of sessions that reached act2_reached differs by 30pp between English and Japanese players. The split surfaces as its own digest section so the AI can name audience-shape signals (localization gaps, regional pacing differences) as their own findings, not as causal claims.
Today, the only cohort axis is language (read from the session-summary language flag). Region, new-vs-returning, and other dimensions are on the roadmap.
A milestone surfaces in the segment-splits section only when both gates pass:
sqrt(p_hi(1-p_hi)/n_hi + p_lo(1-p_lo)/n_lo). The split has to clear ~2 SE to count, approximately 95% non-overlap on the proportions. This filters out 50/50 splits where one bucket has 4 testers.The bucket gate is independent: a language only persists in the segment table at all when its session count clears segmentMinSessions (default 30). Below that, proportion estimates are noise.
The build prompt has one explicit carve-out for segment splits: a split with ≥30pp spread surfaces even when the milestone is on the dev's ignore list or tagged as a non-progression category. The rationale: a 60pp Japanese-vs-English divergence on settings_opened is a localization or audience-shape signal, not opt-action noise. Smaller spreads (10–30pp) still respect ignore and category rules.
KPIs don't change the rollup data. They change how the AI weights findings derived from that data. Two concrete worked examples make the difference visible.
Suppose wishlist_clicked fires in 1.2% of demo sessions. The base finding ("Wishlist conversion is low at 1.2%") would normally land at LOW severity because conversion-rate findings on a single named action are LOW by the severity ladder.
Add wishlist_clicked to the primary KPI list. The finding gets UPGRADED to MEDIUM, the dev declared this is what they actually care about. The summary leads with it instead of burying it under a drop pattern three lines down.
Suppose your game has a noisy reset_progress event that fires whenever a player taps the reset button (some players reset every session as a habit). Without tuning, the AI might surface "30% of sessions reset_progress" as a finding.
Add reset_progress to the ignore list. The finding is dropped entirely, even at LOW severity. The dev opted out, they don't want to hear about it.
Exception: if a segment split surfaces a ≥30pp divergence on reset_progress (say, Japanese players reset 5× more often than English), the split still surfaces. The ignore list filters base-rate findings; it doesn't filter audience-shape signals.
Secondary doesn't change severity. It exists as an explicit "this matters but I don't need it tier-upped" hint. Useful when you want to remind future-you (or the AI) that a milestone is worth tracking, without forcing every related finding to the top of the punch list.
When the game's KPIs object has any entries, the prompt builder emits a Priorities block inside the ## Game context header:
## Game context Genre: Indie · Idle, Clicker Game context (set by the developer): <your free-form ai_context> Priorities (set by the developer): Primary: wishlist_clicked, act2_reached, demo_completed Secondary: pact_attempted Ignore: session_heartbeat, reset_progress, settings_opened
The build prompt has explicit rules downstream of this block:
wishlist_clicked is the conversion KPI keeps HIGH severity for it.The "subject" of a finding is the dominant event name or metric key in it. When ambiguous, the AI uses the first event named in the finding's body.
Turning the master switch off does not put the game in a half-broken state. It changes one specific thing, the AI ladder no-ops on session quiescence, and leaves everything else running.
Each time a session lands on a disabled game, Playloop logs an “auto-analyze skipped” entry at info severity , so you can verify the gate is firing as expected and see exactly how many sessions you've ingested without paying for AI. Full audit-log surface (storage, retrieval, severity tiers) is documented under /docs/notifications → Audit log.
The “Regenerate” buttons on the session, build, and tester pages bypass the auto-analyze toggle by design, you clicked the button, you meant it. The auto-analyze gate only short-circuits the automatic ladder that runs when a session goes quiet.
Manual regen still respects every other gate:
deterministic/no-ai. The only hard-block here is a saved BYO key whose ciphertext can't be decrypted (e.g. after an AUTH_SECRET rotation), where we 402 so you re-enter it instead of silently downgrading to deterministic.Build-summary Regenerate has a cheap server-side check that runs before any AI call: if neither the underlying data nor the analysis-tuning settings have changed since the existing summary was generated, the AI would produce identical output. Playloop hands back the cached summary instead and surfaces an inline notice with a hint about what to change to get a different result. No tokens spend, no progress card appears.
The check considers two things:
When both match, Regenerate returns the inline “Nothing to regenerate” notice. When either has changed (new sessions OR any tuning setting touched), the AI re-runs and a fresh summary lands.
Practical workflow: change a setting (Tone is the most common one) and click Regenerate, the new voice shows up in seconds. Or upload another playtest session and click Regenerate, the watermark moved, so the AI runs and you get a fresh narrative on the new data.
Why it only applies to build summaries today: the build summary is the most token-expensive of the three (it synthesizes across every tester on the build), so the wasted-token risk is highest there. Tester + game summaries always regenerate on click. We may extend the safety check to those if the pattern proves useful.
Legacy summaries:summaries generated before this safety check existed don’t carry a tuning fingerprint. The first Regenerate click on those always fires the AI (the system treats “no fingerprint” as “always regenerate” to be safe). The next click without changes short-circuits as expected.
The build / tester / game pages render an "AI: manual only" banner when the toggle is off, so you know the empty narrative section is intentional. Existing cached summaries stay visible, turning the toggle off doesn't delete history, just pauses regeneration.
Four common shapes. Each lists the settings to set and the resulting behavior. Pick the closest match to your situation and tune from there.
You're a studio lead, not a designer. You want one paragraph and three bullets, max.
Punch list capped at 3 items. Summary ≤ 600 chars (2–3 sentences). Each finding's detail ≤ 200 chars. No remediation suggestions. Only the strongest drop patterns (≥15pp drops, conditional ≥0.90) and segment splits (≥15pp spreads, 50+ session buckets) surface.
You're 30 sessions into a closed alpha. Cohorts are small, but you want every signal even if it's noisy.
Punch list up to 15 items, detailed verbosity (up to 3,500-char summary, 1,200-char per-finding detail), full remediation hypotheses on each finding. Smaller cohort buckets surface in segment splits. Smaller funnel drops register. Expect a wordy report, you asked for it.
Your demo is on Steam during Next Fest. The metric that matters is wishlist conversion. Everything else is secondary.
Wishlist findings get severity-upgraded one tier. Noisy opt-actions and heartbeats never appear in the punch list. The summary leads with conversion signals. Segment splits still surface 30+pp localization signals on any milestone, including ignored ones.
You're a week from launch. You want the AI to shut up about minor noise and only surface things that are objectively broken vs. the prior build.
Punch list of up to 5 items. Standard verbosity. No remediation suggestions, patterns + evidence only. Drop patterns need 15pp drops and 0.90 conditional. Segment splits need 50+ session buckets and 15pp spreads. The AI will describe what's broken, not propose fixes, you're past the point where it should be doing UX speculation.
You've filled out the game's genre, subgenres, and AI context. Without those, the AI is reading your event names cold and inferring what kind of game it's looking at. With them, the prompt's "Game context" block is your source of truth and the AI defers to it. See /docs/ai-prompt for what to write there.
These docs are evolving. Playloop is in active development ahead of launch, so APIs and details may change as we polish.