Measure the build
your users get.

The production build your users actually run — measured across versions, flagged as a regression only when the change clears real, measured noise.

Cross-version performance, on the real build
RAM RSS
0MB
start142MB peak356MB
median of 5 · ±12 MB · lower better
Render count
0
wasted18% wasted/cmd2.4
median of 5 · ±186 · lower better
version4.2.0 (812) branchmain commita1c3f90
iPhone 15 (sim) · iOS 26.4 · window 30 s

Find the screen that's leaking.

A global memory slope is duration-fragile. weftrun diffs the memory retained per screen — Δ-of-Δ — so a screen that leaks 3.4 extra MB reads the same whether the window was 30 s or 120 s. Hover a row to see exactly when in the run it happened.

Per-screen Δ RAMsingle-run · duration-robust
Screenbaseline Δcandidate ΔΔ-of-Δ
Detail+8.2+11.6+3.4 MB
Feed+4.1+4.8+0.7 MB
Home+2.3+2.1−0.2 MB
Onboarding+1.8✦ new
Lines aligned at t=0 (window start). Δ-of-Δ keeps color even when run windows differ. — nav · RAM · CPU (sim-only)

Repeatability is measurable.

--repeat 5 runs your flow five times — each an isolated cold start, the app terminated between runs. Build and install once; measure five times. The sample standard deviation across those runs is your noise floor. Not a policy. Not a guess. A measurement.

$ weftrun run checkout-flow --repeat 5
  build once · install once · measure 5×
  ✓ 5 cold starts · ram.rss median 324 MB · ±12 MB
five runs · cold start each
git diff --stat
$ git diff --stat
$

release build · production-optimized · auto-reverted

No source changes.
Production-representative.

weftrun instruments your release build at build time — no edits to your codebase, no SDK to add. You measure the optimized, production-compiled app your users actually run, not a debug build with developer-mode overhead. When the run ends, every injected change is reverted.

Color only when signal beats noise.

A red number is an accusation. Most tools accuse on a flat 5% — crying wolf on a 6% wobble that's pure noise, and staying silent when a leak goes from zero to five megabytes. weftrun colors a delta only when it clears 2σ of measured noise. A change that appeared from nothing gets a ✦ new badge. And if two runs aren't comparable, it refuses to diff them at all.

RAM RSS
324298MB
▼ −8%
clears 2σ (±15 combined) · median of 5
Render count
1,8472,100
▲ +14%
clears 2σ (±196 combined) · median of 3
JS heap comparable
41.242.0MB
· +2%
within 2σ noise floor — not colored
RAM trend
01.8MB/min ✦ new
appeared
0 → N — structural, surfaced first-class
⚠ Some runs not comparable — device differs (iPhone 15 → iPhone 13). weftrun refuses the diff rather than lying.
the gate

Drag it: nothing turns red
until it earns red.

This is the chokepoint every verdict passes through. Move the candidate's delta. The badge stays neutral until the change clears twice the measured standard deviation — then, and only then, it's a regression.

measurement classes
verdict
the pass/fail headline metric (ram.rss, render.count).
comparable
cross-version comparable diagnostic.
sim-only
host/sim-bound — same-machine trending only.
proxy
not a real device number (rAF self-loop "FPS").
verdict gate render.count · baseline 1,847
+0.0% · below 2σ guard
candidate Δ vs baseline · noise floor ±196 (2σ)
0+450
below 2σ → noise above 2σ → regression

Measure the next change.

Free

$0
  • 5 runs / month
  • public dashboard
  • 1 app
Start free

Team

$29/mo
  • unlimited runs
  • private CI gating
  • fingerprint-gated baselines
  • priority support
Get started
weftrun run <flow> --repeat 5