01 / read

See if a change
is real.

Measure app performance across code versions. Repeat each flow N times from cold start, read the variance, and color a delta only when it clears the metric's own measured noise floor — not a blind percent guard.

React Native · iOS simulator · Maestro flows · Release build + Hermes
fig. 1 — a run that knows its own variance
RAM RSS
0MB
start142MB peak356MB
median of 5 · ±12 MB · lower better
Render count
0
wasted18% wasted/cmd2.4
median of 5 · ±186 · lower better
version4.2.0 (812) branchmain commita1c3f90 jsEnginehermes buildTyperelease
acme-app · iPhone 15 (sim) · iOS 17.4 · maestro · window 30 s · 7 steps · snapshot 9f2c4e1b
02 / repeat

Repeatability is measurable.

--repeat 5 runs your flow five times — each an isolated cold start, the app terminated between runs. Build and install once; measure five times. The sample standard deviation across those runs is your noise floor. Not a policy. Not a guess. A measurement.

$ weftrun run checkout-flow --repeat 5
  build once · install once · measure 5×
  ✓ 5 cold starts · ram.rss median 324 MB · ±12 MB
five runs · cold start each
git diff --stat
$ git diff --stat
$

release build · hermes bytecode · auto-reverted
03 / inject

No source changes.
Production-representative.

The harness injects component markers and a runtime hook into your Release binary at build time, via a NODE_OPTIONS preload — no edits to your codebase. You measure Hermes bytecode and production React, not a debug-mode fiction. When the run ends, every injected change is reverted.

04 / compare

Color only when signal beats noise.

A red number is an accusation. Most tools accuse on a flat 5% — crying wolf on a 6% wobble that's pure noise, and staying silent when a leak goes from zero to five megabytes. weftrun colors a delta only when it clears 2σ of measured noise. A change that appeared from nothing gets a ✦ new badge. And if two runs aren't comparable, it refuses to diff them at all.

RAM RSS
324298MB
▼ −8%
clears 2σ (±15 combined) · median of 5
Render count
1,8472,100
▲ +14%
clears 2σ (±196 combined) · median of 3
JS heap comparable
41.242.0MB
· +2%
within 2σ noise floor — not colored
RAM trend
01.8MB/min ✦ new
appeared
0 → N — structural, surfaced first-class
⚠ Some runs not comparable — jsEngine differs (hermes → jsc). weftrun refuses the diff rather than lying.
the gate

Drag it: nothing turns red until it earns red.

This is the chokepoint every verdict passes through. Move the candidate's delta. The badge stays neutral until the change clears twice the measured standard deviation — then, and only then, it's a regression.

+0.0% · below 2σ guard
candidate Δ vs baseline 1,847 · noise floor ±196 (2σ)
measurement classes
verdict
the pass/fail headline metric (ram.rss, render.count).
comparable
cross-version comparable diagnostic.
sim-only
host/sim-bound — same-machine trending only (CPU, net latency).
proxy
not a real device number (rAF self-loop "FPS").
05 / locate

Find the screen that's leaking.

A global memory slope is duration-fragile. weftrun diffs the memory retained per screen — Δ-of-Δ — so a screen that leaks 3.4 extra MB reads the same whether the window was 30 s or 120 s. Hover a row to see exactly when in the run it happened.

Per-screen Δ RAMsingle-run · duration-robust
Screenbaseline Δcandidate ΔΔ-of-Δ
Detail+8.2+11.6+3.4 MB
Feed+4.1+4.8+0.7 MB
Home+2.3+2.1−0.2 MB
Onboarding+1.8✦ new
Lines aligned at t=0 (window start). Δ-of-Δ keeps color even when run windows differ. — nav · RAM · CPU (sim-only)
06 / honesty

Other tools measure faster.

weftrun measures honestly. The difference is four rows:

caseblind % guardweftrun · measured
ram.rss 324 → 298 MB−8% (flagged either way)▼ −8% — clears 2σ
render.count 1,847 → 1,860+0.7% ✓ called clean· flat — within ±186
leak 0 → 1.8 MB/minswallowed (0% delta)✦ new — surfaced
jsEngine hermes → jscsilently merged⚠ refused — not comparable
  • Variance is measured, not guessed.
  • Partial captures are flagged, not fabricated as zeros.
  • Structural changes (0 → N) surface first-class, not suppressed.
  • Mismatched runs are refused, not merged.

Scope today: React Native + Maestro, iOS simulator. CPU % is sim-only; JS FPS is rAF cadence — both labeled, never sold as device truth.

A measurement you can trust is one that tells you when it can't.

07 / start

Measure the next change.

Free

$0
  • 5 runs / month
  • public dashboard
  • 1 app
Start free

Team

$29/mo
  • unlimited runs
  • private CI gating
  • fingerprint-gated baselines
  • priority support
Get started
weftrun run <flow> --repeat 5