The analysis pipeline

Playloop turns raw playtest events into a few paragraphs you can actually read. This page walks the whole path: how a session becomes a rollup, how rollups become an AI-readable digest, and every knob in between that shapes what the summary says about your game.

The architecture commitment: code aggregates, AI synthesizes. The AI never reads raw events. Every level of the ladder consumes a pre-computed structured digest, so token cost stays bounded no matter how popular a build gets.

The big picture: from raw events to "Insights ready"

A session lands in Playloop and rides one pipeline all the way to your inbox. There are five visible stages:

Telemetry / upload. Your SDK posts events to POST /api/telemetry. Each batch is appended to the session’s transcript and de-duplicated by client-side event id.
Session rollup. Within five minutes of the session going quiescent (no new events), Playloop materializes a session digest, event counts, session-summary flags, pre-clustered insights. No raw events past this point.
Cascade to device / build / game / event-name. The same pass refreshes the next four rollup levels, every level is incremental, so a new session adds its delta rather than recomputing the world.
AI summary trigger. If the game’s auto-analyze toggle is on, session quiescence kicks insight extraction → user summary → build summary. If off, the rollups still write but the AI ladder no-ops (manual regenerate buttons stay live).
Bundled “Insights ready” email. When summaries finish and the user’s notification preferences include email, Playloop bundles the new insights into one email instead of one per session. You see the headline, not the firehose.

The two things to internalize: rollups are always written (drop patterns, splits, milestone rates), and AI is layered on top (auto, or manual, depending on the master switch). Turning AI off does not turn the dashboard off, it just stops the prose summary from regenerating.

The four pieces that shape the AI output

Once you have telemetry flowing, four configuration surfaces decide what the summary says:

Game context: Genre + subgenres + your free-form AI context. Prepended to every prompt as "## Game context". Anchors the AI to your game shape and lets you say things like "soul counts in the billions are normal" so heavy tails don't read as overflow bugs.
KPIs: Three event-name lists: primary (lift severity one tier), secondary (keep base severity), ignore (drop entirely). Tells the AI what to emphasize and what you don't want to hear about.
Event categories: Per-event semantic tags (progression, opt_action, system, cosmetic, inverted, unknown). Auto-classified from event-name patterns on first sight; you can override. Controls which events are eligible to be drop-pattern steps and which get a severity downgrade.
Analysis tuning: Composable presets + individual knobs. Analysis profile picks a baseline (executive / balanced / investigative); strictness shifts the four numeric thresholds; individual knobs override either. Resolution order: explicit knob > strictness > profile > built-in default.

Each of these has a section below. The short version: the context tells the AI what kind of game it’s reading; the KPIs say what matters; the categories filter the funnel inference; and the tuning shapes how long, how detailed, and in what voice the report is.

Per-game settings, what each does

Every setting on the game settings page (/games/[slug]/settings) is documented below. Each entry covers WHAT the setting does, WHY a studio would change it, and HOW it affects the AI output.

Genre, subgenres, and AI context

The three free-form game-shape inputs. Saved as columns on the games row.

Genre: Single primary genre (Steam-style). Anchors the AI to a coarse game shape. "Indie", "Strategy", "Adventure", etc.
Subgenres: Up to 5 curated subgenres ("Idle, Clicker", "4X, Turn-based"). Narrows the shape inside the primary genre.
AI context: Free-form 5,000-char prompt, your demo scope, mechanics, currencies, what good engagement looks like, what looks like friction but isn't. See /docs/ai-prompt for examples by genre.

All three are stitched into every AI prompt as a single ## Game contextblock at the top. The build prompt explicitly tells the model this block is “your source of truth for genre, mechanics, and what counts as normal progression for THIS game”, so a good prompt directly shapes how the AI reads your numbers.

KPIs, primary, secondary, ignore

Three independent event-name lists that drive severity weighting on the AI's punch list. Saved on the game.

{
  "primary":  ["wishlist_clicked", "act2_reached", "demo_completed"],
  "secondary": ["pact_attempted"],
  "ignore":   ["session_heartbeat", "reset_progress", "settings_opened"]
}

How each bucket changes the AI output:

Primary: Findings whose main subject is on this list get UPGRADED one severity tier (LOW→MEDIUM, MEDIUM→HIGH, HIGH stays HIGH). These are the events the dev actually cares about.
Secondary: Same as the base severity ladder, the dev flagged them as "matters" but did not push for tier-up. Useful for "watch these" items.
Ignore: Findings whose main subject is on this list are DROPPED entirely from the punch list. The dev explicitly opted out, they don't want to hear about them.

Overlap resolution: ignore wins (the most opinionated bucket), then primary, then secondary. So if you put wishlist_clicked on both primary and ignore, ignore wins.

One important exception baked into the prompt: a segment split with ≥30pp spread surfaces even when its milestone is on the ignore list. A 60pp Japanese-vs-English divergence on settings_opened is a localization signal, not opt-action noise. Smaller spreads (10–30pp) still respect ignore.

Caps: 20 entries per list, 100 chars per entry. Strings are free-form · Playloop does not enforce that the event exists yet (you can declare priorities before you ship the events that fire them).

Event categories

Per-event semantic tags saved against each event name. Edited from /games/[slug]/settings/events. Auto-classified from the event name on first sight; you override when the heuristic is wrong.

progression: The dev-intended player path. Higher frequency = better. Eligible as both a drop-pattern predecessor and a drop-pattern successor.
opt_action: Optional player action (reset_progress, settings_opened, wishlist_clicked). Frequency is not a quality signal, some players reset, some don't. Filtered out of drop patterns and findings get downgraded one severity tier (unless the event is on the primary KPI list).
system: Infrastructure events the SDK emits regardless of gameplay (session_start, session_heartbeat, focus_lost). Never useful as a funnel step. Filtered out.
cosmetic: Visual / UI events that don't represent progress (theme_changed, language_changed, volume_changed). Filtered out.
inverted: Outcomes where MORE is worse (cinematic_skipped, hint_dismissed, session_abandoned). The drop-pattern direction reverses for these.
unknown: Uncategorized. Permissive default, included as a drop-pattern successor until you demote it. New events default here when the name-pattern heuristic can't classify them.

The auto-classifier checks suffixes and prefixes in order of specificity:

*_skipped / _canceled / _dismissed / _declined / _aborted → inverted
is_* predicate flags → system
session_* / client_* / entity_* / player_* prefixes → system
theme_* / language_* / volume_* / mute_* → cosmetic
*_clicked / _opened / _pressed / _toggled / reset_* / settings_* / wishlist_* / cancel_* → opt_action
everything else → unknown

When the dev hasn't categorized an event, the digest renderer falls back to this heuristic, so default behavior still surfaces pattern-matched opt-actions and system events in the AI's severity weighting.

Analysis tuning

The composable preset stack that decides how dense, how strict, how prescriptive, and in what voice the AI report is. Saved per game, so future knobs can land without disturbing what you've already set.

Plain English by default. The toneknob (described in “Individual knobs” below) ships set to narrative, summaries explain themselves like a friend talking, not a stats report. Flip it to analystif you’d rather read raw percentages and percentile values. Underlying analysis is identical either way; only the prose changes.

Analysis profile (the outermost preset)

Pick one of three profiles, each sets baselines for maxFindings, verbosity, strictness, AND recommendations in one click.

Profile	maxFindings	verbosity	strictness	recommendations
executive	3	executive	conservative	off
balanced	5	standard	standard	light
investigative	10	detailed	exploratory	full

The profile is the biggest hammer in the box. Most studios pick a profile and never touch the individual knobs.

Strictness preset (the four numeric thresholds)

A preset that shifts four numeric thresholds for how much evidence is required before a signal surfaces. Set independently of the profile, OR derived from the profile.

Strictness	segmentMinSessions	segmentSpreadFloorPp	cliffDropFloorPp	cliffConditionalFloor
conservative	50	15	15	0.90
standard	30	10	10	0.85
exploratory	15	5	5	0.75

Conservative: only the highest-signal findings survive. Exploratory: surface weak signals too. Standard is the same as the built-in defaults.

Individual knobs (explicit overrides)

Every preset can be overridden by an explicit knob. Useful when 90% of the preset is right but one threshold needs tightening.

maxFindings: Range 3–15. Cap on the "what to fix next" punch list. The AI is told to drop weaker findings rather than pad to the cap.
verbosity: executive (≤600 char summary, ≤200 char details) / standard (≤2000 / ≤600) / detailed (≤3500 / ≤1200). Sets both the prompt instructions and the response length cap.
recommendations: off (patterns + evidence only, no UX prescriptions) / light (optional one-line suggestion when evidence supports it) / full (1–3 ranked remediation hypotheses per finding).
tone: narrative (default) / analyst. Narrative writes the digest in plain English ("two-thirds of players reached Act 2, down from everyone in the last build") for devs who don't want to wade through stats vocabulary. Analyst keeps the dense numeric voice (p50/p90, percentage-point deltas, cohort sizes). The underlying analysis is identical across modes, only the prose changes. Severity grades and the action list stay intact in both.
segmentMinSessions: Range 5–1000. Min sessions per language bucket before a segment is persisted. Below 30, proportion estimates become noise (a single tester can swing a milestone 3+pp).
segmentSpreadFloorPp: Range 1–50. Min raw percentage-point gap between extreme language buckets before a milestone surfaces in "Segment splits". Tighter = quieter section.
cliffDropFloorPp: Range 1–50. Min percentage-point drop between predecessor and successor to count as a drop pattern. Higher = only the biggest drops surface.
cliffPredFreqCeiling: Range 0.5–1.0. Predecessors with frequency above this are filtered (universal bools like is_demo would otherwise be trivial predecessors). Default 0.99.
cliffConditionalFloor: Range 0.5–1.0. Min P(predecessor | successor), how often a session reaching the successor also reached the predecessor. Higher = only tight prerequisite relationships count as drop patterns.

Resolution precedence

The composable stack collapses in a strict order wherever tuning is read:

1. Explicit knob       (you set it directly)
   ↓ if unset
2. Strictness preset   (you picked a strictness OR a profile that implies one)
   ↓ if unset
3. Analysis profile    (you picked a profile)
   ↓ if unset
4. Built-in default

Profile picks a strictness baseline. Strictness sets the four numeric thresholds. Any explicit knob still wins. So you can pick investigative and then turn cliffDropFloorPp back up to 15 if you want exploratory findings everywhere except drop patterns.

All numeric knobs are clamped server-side to their range, out-of-range values are clamped, not rejected.

Auto-analyze toggle

The master switch for AI cost on a game. Edited from the settings page; autosaves on toggle.

Default: OFF for new games. You opt in by toggling, one game at a time.
When OFF: Session ingest still succeeds. Rollups still write (drop patterns, splits, milestones all still compute). Insight extraction is skipped on new sessions. Auto-summary becomes a no-op. Manual regenerate buttons stay live. The audit log records an entry each time a session lands on a disabled game so you can verify the gate is firing.
When ON: When a session goes quiet, Playloop runs insight extraction, then regenerates the user and build summaries. The bundled "Insights ready" email fires when notification preferences include email. Each step is independently gated by your BYOK monthly cap and the daily managed-AI cap.
Dashboard treatment: Build and tester pages render a "AI: manual only" indicator when off, so you know you're looking at cached or empty narratives. Code-aggregated stats render either way.

The toggle is the cheapest safety net in the system. If you want to set up a game and ship telemetry without spending a cent on AI until you've decided it's working, leave it off and use the manual regenerate buttons.

Rollup architecture, five levels, incremental

Naively loading every session into memory to compute build / user / game stats stops working at indie scale (one popular build is ~50 sessions; loading them for every dashboard render is 50× the cost it needs to be). Playloop replaces that with five incremental rollup levels, refreshed in the background as new sessions land.

Mental model: every rollup level reads from the level below. Reading the build digest is one lookup. Adding a session is one upsert through the chain. Recomputing from scratch is rare, the background sweep only walks sessions that have changed since the last refresh.

The five levels

Session digest: One per session. Event counts, session-summary flags (booleans, numbers, short strings), pre-clustered insights. Written once the session goes quiescent. The base layer everything else builds on.
Device digest: One per device per game per environment. Aggregates every session that device played for the game. Booleans collapse to rates, numbers to averages and percentiles, strings to a short top list. This is what the per-tester summary reads.
Build digest: One per build per environment. Aggregates every session on that build, event counts, session-summary stats, top insight clusters, percentile counters, recent tester handles, drop patterns, and per-language splits. This is what the build summary reads.
Game digest: One per game per environment. Aggregates every build. Holds cross-build trend direction (improving / worsening / flat) and all-time insight clusters. This is what the game summary reads.
Event-name digest: One per (game, event name, environment, day). Day-bucketed so the volume stays bounded. Powers “events over time” charts without re-scanning sessions. Each row carries the session count, total fire count, and a few sample payloads for classifier hints.

The day key is UTC. Sessions are attributed to the UTC day their tail event landed on, so a session that started 23:00 UTC and ran past midnight gets bucketed to the day it ended. The grain is fixed at one day, there is no per-hour or per-week variant. Charts that need a finer grain would have to re-scan sessions, which is what this layer exists to avoid.

The background sweep

Playloop refreshes rollups in the background every few minutes. Each pass:

Picks all (game, environment) pairs that have at least one session newer than the last refresh. Cheap when nothing changed.
For each pair, walks the sessions that have new events since the last pass, computes the diff, and upserts session → device → build → game → event-name in one go.
Idempotent, a content hash on each rollup level dedups writes so a no-op pass doesn't churn anything. Downstream AI-summary caching keys off that same hash, if nothing changed, no regen.
Chunked-resumable: long-running passes can stop between games when they hit a budget and resume from the same place on the next pass.
Per-(game, env) grain: sessions in different environments don't mix. Tester production sessions never show up in their staging rollups.

A build-summary regeneration triggers a fresh rollup pass for that game first, so a build that landed between background ticks still gets a current rollup. The pass no-ops when nothing has moved, so this is cheap.

Snapshot pinning

When Playloop adds a new field to a rollup (a new segment dimension, a new flag aggregation), past summaries stay pinned to the data shape they were generated from, no silent regressions where the same prompt suddenly produces a different answer.

Cross-build trend direction

The game digest carries a cross-build trends list, one entry per insight cluster, each tagged improving / worsening / flat. The classifier splits your game’s builds in half by first-seen time, older half is the prior window, newer half is the recent window, and sums each cluster’s negative-leaning fire count in each half.

The direction rule is a 2× ratio threshold on those two sums (plus floor-handling for zeros):

improving, cluster fired in the prior half but is gone from the recent half (prior > 0, recent = 0), ORrecent / prior < 0.5 (more than halved).
worsening, cluster fired in the recent half but is new vs the prior half (prior = 0, recent > 0), ORrecent / prior > 2.0 (more than doubled).
flat, everything else, including ratios in the [0.5, 2.0] band.

A minimum-evidence gate filters noise: a cluster needs at least 3 total fires across the two halves to be considered at all. The final list is sorted worsening → improving → flat and capped at the top 10 entries by combined fire count, so the game-summary digest leads with regressions.

Drop patterns, where players drop off

A drop pattern is a pair of session-end milestones where most players who reach the second one also reached the first, but a meaningful chunk of players who reached the first never made it to the second. It's the closest thing Playloop has to a causal "drop-off point" signal, computed code-side from milestone co-occurrence, not inferred by the AI.

Worked example: if 97% of sessions reached entered_game and 64% reached act2_reached, that's a 33pp drop. If P(entered_game | act2_reached) = 99% (almost everyone who reached act2 also reached entered_game first), the drop pattern fires and surfaces in the build digest.

The three thresholds

For a (predecessor, successor) pair to register as a drop pattern, all three must be true:

Drop ≥ cliffDropFloorPp. The absolute percentage-point gap (predFreq - succFreq) must exceed this floor. Default 10pp.
Conditional ≥ cliffConditionalFloor. P(predecessor | successor) must exceed this floor. This is what makes the pair "causally implied" rather than two independent milestones that happen to have different rates. Default 0.85.
Predecessor frequency ≤ cliffPredFreqCeiling. The predecessor's own frequency must be below this ceiling. Universal bools like is_demo (true for nearly every session) would otherwise be trivial predecessors of everything. Default 0.99.

Category filtering

On top of the numeric thresholds, the drop-pattern inference filters by event category. Only progression and unknown events are eligible as either predecessor or successor. That's why opt-actions (reset_progress), system events (session_heartbeat), cosmetic toggles (theme_changed), and inverted outcomes (cinematic_skipped) never become noise drop patterns.

The successor side also blocks known noise tails (matching the categorizer’s inverted tag heuristics), and the drop-pattern list is de-duplicated for transitive edges: if A → B and A → C both exist and B → C also exists, A → C is dropped because B is the tighter predecessor.

How tuning shifts what surfaces

Lower cliffConditionalFloor: Surfaces more drop patterns, the pair doesn't have to be as tightly causally implied. Useful when you want to see weaker prerequisite relationships.
Higher cliffConditionalFloor: Surfaces fewer drop patterns, only tight prerequisite pairs (e.g. "everyone who reached act2 also reached act1") count. Quieter, sharper section.
Lower cliffDropFloorPp: Surfaces smaller drops. Useful in early playtest when you want every 5pp dip flagged.
Higher cliffDropFloorPp: Surfaces only dramatic drops. Useful in late polish when 5–10pp dips are within normal cohort noise for your sample size.

The rendered drop patterns section is sorted largest-drop-first and capped at ~12 entries per build digest.

Segment splits, cohort divergences with statistical gating

A segment split is a per-cohort divergence on a milestone rate, for example, the proportion of sessions that reached act2_reached differs by 30pp between English and Japanese players. The split surfaces as its own digest section so the AI can name audience-shape signals (localization gaps, regional pacing differences) as their own findings, not as causal claims.

Today, the only cohort axis is language (read from the session-summary language flag). Region, new-vs-returning, and other dimensions are on the roadmap.

The two gates that filter noise

A milestone surfaces in the segment-splits section only when both gates pass:

Raw spread ≥ segmentSpreadFloorPp. The percentage-point gap between the lowest-rate and highest-rate buckets must exceed this floor. Default 10pp.
Spread > 2× combined standard error. Treating each bucket's milestone rate as a binomial proportion estimate, the combined SE of the two extreme buckets is sqrt(p_hi(1-p_hi)/n_hi + p_lo(1-p_lo)/n_lo). The split has to clear ~2 SE to count, approximately 95% non-overlap on the proportions. This filters out 50/50 splits where one bucket has 4 testers.

The bucket gate is independent: a language only persists in the segment table at all when its session count clears segmentMinSessions (default 30). Below that, proportion estimates are noise.

Ignore-list override

The build prompt has one explicit carve-out for segment splits: a split with ≥30pp spread surfaces even when the milestone is on the dev's ignore list or tagged as a non-progression category. The rationale: a 60pp Japanese-vs-English divergence on settings_opened is a localization or audience-shape signal, not opt-action noise. Smaller spreads (10–30pp) still respect ignore and category rules.

How tuning shifts what surfaces

Lower segmentMinSessions: Smaller language buckets get their own rollup row. Below 30 the proportion estimates are statistically volatile, the UI shows a warning. Useful only when the cohort itself is small (early closed playtest).
Higher segmentMinSessions: Only well-populated buckets get measured. The segment-splits section stays empty until you have real cohort sizes.
Lower segmentSpreadFloorPp: Surfaces smaller divergences. The SE gate still filters cohort-noise, so this isn't a noise floor in itself.
Higher segmentSpreadFloorPp: Surfaces only the biggest divergences. Useful when you want the AI to lead with the loudest localization signals.

KPI weighting, how priorities shape the summary

KPIs don't change the rollup data. They change how the AI weights findings derived from that data. Two concrete worked examples make the difference visible.

Example 1, primary lifts severity

Suppose wishlist_clicked fires in 1.2% of demo sessions. The base finding ("Wishlist conversion is low at 1.2%") would normally land at LOW severity because conversion-rate findings on a single named action are LOW by the severity ladder.

Add wishlist_clicked to the primary KPI list. The finding gets UPGRADED to MEDIUM, the dev declared this is what they actually care about. The summary leads with it instead of burying it under a drop pattern three lines down.

Example 2, ignore drops findings entirely

Suppose your game has a noisy reset_progress event that fires whenever a player taps the reset button (some players reset every session as a habit). Without tuning, the AI might surface "30% of sessions reset_progress" as a finding.

Add reset_progress to the ignore list. The finding is dropped entirely, even at LOW severity. The dev opted out, they don't want to hear about it.

Exception: if a segment split surfaces a ≥30pp divergence on reset_progress (say, Japanese players reset 5× more often than English), the split still surfaces. The ignore list filters base-rate findings; it doesn't filter audience-shape signals.

Example 3, secondary is the "watch this" tag

Secondary doesn't change severity. It exists as an explicit "this matters but I don't need it tier-upped" hint. Useful when you want to remind future-you (or the AI) that a milestone is worth tracking, without forcing every related finding to the top of the punch list.

How priorities reach the AI

When the game's KPIs object has any entries, the prompt builder emits a Priorities block inside the ## Game context header:

## Game context
Genre: Indie · Idle, Clicker
Game context (set by the developer):
<your free-form ai_context>

Priorities (set by the developer):
  Primary: wishlist_clicked, act2_reached, demo_completed
  Secondary: pact_attempted
  Ignore: session_heartbeat, reset_progress, settings_opened

The build prompt has explicit rules downstream of this block:

Primary KPI subjects get UPGRADED one tier (MEDIUM → HIGH, LOW → MEDIUM, HIGH stays HIGH).
Secondary subjects use the base severity ladder unchanged.
Ignore subjects: DROP the finding entirely.
Priority overrides the category downgrade, if an opt-action is on the primary list, the dev meant it. A game where wishlist_clicked is the conversion KPI keeps HIGH severity for it.

The "subject" of a finding is the dominant event name or metric key in it. When ambiguous, the AI uses the first event named in the finding's body.

What happens when auto-analyze is off

Turning the master switch off does not put the game in a half-broken state. It changes one specific thing, the AI ladder no-ops on session quiescence, and leaves everything else running.

What still works

Session ingest. Events still post, the transcript still appends, the session still lands on the dashboard.
Rollups. Playloop still refreshes session / device / build / game / event-name rollups in the background. Every code-aggregated stat (engagement rate, total play time, milestone reach rates, named-action conversion, event counts) renders normally.
Drop patterns. Still computed and surfaced in the build dashboard's stats panel. They're rollup-driven, not AI-driven.
Segment splits. Same, computed code-side and written to the build rollup.
Manual regenerate buttons. The “Regenerate” button on session / build / game pages still kicks the AI ladder on demand. You opt into AI cost on your own schedule. See the “Manual regenerate, in precise terms” section below for exactly which gates regen bypasses vs respects.

What stops

Insight extraction on new sessions. The auto-analyze toggle is checked on entry and the path skips when off.
Auto-regeneration of user and build summaries. Both auto-paths check the toggle up front and no-op when off, they never call into the AI summarizer.
“Insights ready” emails. Bundled emails only fire after insight extraction succeeds. When auto-analyze is off, extraction never runs and nothing gets queued, so your notification preferences for “analysis finished” emails have nothing to gate.
Prose narrative on the build / tester / game pages. Without an AI summary, the pages render code-aggregated stats only, no “what testers thought” paragraph.

Audit trail

Each time a session lands on a disabled game, Playloop logs an “auto-analyze skipped” entry at info severity , so you can verify the gate is firing as expected and see exactly how many sessions you've ingested without paying for AI. Full audit-log surface (storage, retrieval, severity tiers) is documented under /docs/notifications → Audit log.

Manual regenerate, in precise terms

The “Regenerate” buttons on the session, build, and tester pages bypass the auto-analyze toggle by design, you clicked the button, you meant it. The auto-analyze gate only short-circuits the automatic ladder that runs when a session goes quiet.

Manual regen still respects every other gate:

Plan gate, Free-with-BYOK and the managed-AI plans (Indie / Studio) both proceed to the AI extractor as normal. Free without a BYO key still runs analyze, just without the AI step, the session lands with a deterministic structural summary (event counts, top events, durations, positive/negative-moment detection from event names) and no model call. Insights from this path are tagged deterministic/no-ai. The only hard-block here is a saved BYO key whose ciphertext can't be decrypted (e.g. after an AUTH_SECRET rotation), where we 402 so you re-enter it instead of silently downgrading to deterministic.
BYOK monthly budget, when you're on BYOK, the call is rejected if your month-to-date BYOK spend is at or above your monthly cap. No-op for managed-AI calls.
Daily managed-AI cost cap, managed-AI users (Indie / Studio) are capped at a per-tier daily USD ceiling for managed analyses. Over the cap, the call is rejected before tokens spend. Free is BYO-only, so this cost cap doesn't apply to Free.
Daily analyze-session cap, a separate per-workspace ceiling on how many sessions you can analyze per rolling 24h via the buttons (manual re-analyze, bulk re-analyze, analyze-on-import), tiered by plan (5,000 on Free, 25,000 on Indie, 100,000 on Studio). This one does apply to Free. Automatic analysis of newly-ingested sessions doesn't count toward it, only the actions you trigger. Over the cap returns a clear “daily limit” message and resets on a rolling 24h.
Concurrency dedup, if a job for the same (game, build, env) is already running, the request joins the existing job instead of starting a second one. Two tabs both clicking Regenerate don't double-spend.

“Nothing to regenerate”, the safety check

Build-summary Regenerate has a cheap server-side check that runs before any AI call: if neither the underlying data nor the analysis-tuning settings have changed since the existing summary was generated, the AI would produce identical output. Playloop hands back the cached summary instead and surfaces an inline notice with a hint about what to change to get a different result. No tokens spend, no progress card appears.

The check considers two things:

Data watermark, the most recent event timestamp across the build’s sessions. If it’s the same as the watermark stamped on the existing summary, no new data has landed.
Tuning hash, a fingerprint of the resolved Analysis Tuning at the moment the existing summary was generated (every knob: Tone, Profile, Strictness, Verbosity, Recommendations, Max findings, segment + drop-pattern thresholds). If the current resolved tuning matches, no setting that affects the summary has changed.

When both match, Regenerate returns the inline “Nothing to regenerate” notice. When either has changed (new sessions OR any tuning setting touched), the AI re-runs and a fresh summary lands.

Practical workflow: change a setting (Tone is the most common one) and click Regenerate, the new voice shows up in seconds. Or upload another playtest session and click Regenerate, the watermark moved, so the AI runs and you get a fresh narrative on the new data.

Why it only applies to build summaries today: the build summary is the most token-expensive of the three (it synthesizes across every tester on the build), so the wasted-token risk is highest there. Tester + game summaries always regenerate on click. We may extend the safety check to those if the pattern proves useful.

Legacy summaries:summaries generated before this safety check existed don’t carry a tuning fingerprint. The first Regenerate click on those always fires the AI (the system treats “no fingerprint” as “always regenerate” to be safe). The next click without changes short-circuits as expected.

Dashboard treatment

The build / tester / game pages render an "AI: manual only" banner when the toggle is off, so you know the empty narrative section is intentional. Existing cached summaries stay visible, turning the toggle off doesn't delete history, just pauses regeneration.

Tuning recipes, worked examples

Four common shapes. Each lists the settings to set and the resulting behavior. Pick the closest match to your situation and tune from there.

Recipe

Headlines only, I want to glance at the dashboard

You're a studio lead, not a designer. You want one paragraph and three bullets, max.

Settings

Auto-analyze: on
Analysis profile: executive
Strictness: (inherits conservative from profile)
KPIs: optional, set primary if you have one or two top-line KPIs

Result

Punch list capped at 3 items. Summary ≤ 600 chars (2–3 sentences). Each finding's detail ≤ 200 chars. No remediation suggestions. Only the strongest drop patterns (≥15pp drops, conditional ≥0.90) and segment splits (≥15pp spreads, 50+ session buckets) surface.

Recipe

Alpha exploration, surface every weird signal

You're 30 sessions into a closed alpha. Cohorts are small, but you want every signal even if it's noisy.

Settings

Auto-analyze: on
Analysis profile: investigative
Strictness: (inherits exploratory from profile)
maxFindings: 15 (override the profile default of 10)
segmentMinSessions: 15 (already set by exploratory)
cliffDropFloorPp: 5 (already set by exploratory)

Result

Punch list up to 15 items, detailed verbosity (up to 3,500-char summary, 1,200-char per-finding detail), full remediation hypotheses on each finding. Smaller cohort buckets surface in segment splits. Smaller funnel drops register. Expect a wordy report, you asked for it.

Recipe

Wishlist-conversion focus, the demo's job is to convert

Your demo is on Steam during Next Fest. The metric that matters is wishlist conversion. Everything else is secondary.

Settings

Auto-analyze: on
Analysis profile: balanced
KPIs primary: ["wishlist_clicked", "demo_completed"]
KPIs ignore: ["session_heartbeat", "reset_progress", "settings_opened"]
Event categories: tag session_heartbeat as 'system' if the auto-classifier missed it

Result

Wishlist findings get severity-upgraded one tier. Noisy opt-actions and heartbeats never appear in the punch list. The summary leads with conversion signals. Segment splits still surface 30+pp localization signals on any milestone, including ignored ones.

Recipe

Late-polish ship, only flag real regressions

You're a week from launch. You want the AI to shut up about minor noise and only surface things that are objectively broken vs. the prior build.

Settings

Auto-analyze: on
Analysis profile: balanced
Strictness: conservative (override the profile default)
maxFindings: 5 (default)
recommendations: off (override the profile default of light)

Result

Punch list of up to 5 items. Standard verbosity. No remediation suggestions, patterns + evidence only. Drop patterns need 15pp drops and 0.90 conditional. Segment splits need 50+ session buckets and 15pp spreads. The AI will describe what's broken, not propose fixes, you're past the point where it should be doing UX speculation.

One thing every recipe assumes

You've filled out the game's genre, subgenres, and AI context. Without those, the AI is reading your event names cold and inferring what kind of game it's looking at. With them, the prompt's "Game context" block is your source of truth and the AI defers to it. See /docs/ai-prompt for what to write there.

Back

All docs

AI prompt tips, what to write in Game context

AI provider, which model + bill runs the analyses

Notifications & audit log, what the pipeline emits when it finishes