Prove.
Weeks of value measurement now automated and repeatable.
Every variance classification, attestation signature, and dollar figure on this page comes from an actual run of the Aurora L04 Prove automation against the locked baseline captured at L01 signoff. Layer 04 measures realized value against the projection L01 made — before the build started. Every delta is classified across seven categories. The result is a numerical readout against a signed baseline, not a slide deck made the week before.
Source: Keystone-2026Q2 L04 baseline + pilot readout · attested 2026-05-23 · baseline hash
sha256:demo-l04-baseline-…7f3c1b9eMost AI programs cannot tell you what they actually delivered.
The methodology makes the answer structural.
Four things go wrong with AI value measurement after a program ships. Aurora's L04 layer makes each of them impossible to do quietly.
The projection drifts to match the outcome.
The pre-build value claim quietly gets revised every quarter to match what actually happened. Every readout shows green. Aurora locks the baseline at L01 signoff, hash-signs it, and refuses silent revision. Re-baselining requires a documented change order with four-party re-attestation.
Variance gets narrated away.
"It seems to be working" becomes the executive narrative. "Mixed signals" becomes the variance explanation. Aurora classifies every delta against a seven-category rubric with mutual exclusion. Each variance carries its decision-tree path on the record. Free-text rationale cannot override the category.
Positive variance gets reverse-fit into "we crushed it."
A +$2.84M positive variance tempts the team to annualize the pilot into $76M/yr. Aurora's Structural Rule 2 refuses: variance is explained *against the projection*, not by re-shaping the projection. The independent auditor confirms zero reverse-fit narratives before any readout signs.
One person signs off on the value claim.
The CRO signs off on outcomes. CFO never validates the economics. CEO never reviews the variance classification. Aurora requires four-party joint attestation per readout: CFO (economics) + CRO (GTM) + CEO (strategic) + Northbeam (methodology). All four signatures, or no readout ships.
$3.0M net pilot value
at $63K/yr AI cost overhead.
Aurora discloses what the AI loops cost. The fully-loaded overhead is structurally honest because the methodology refuses to hide infrastructure cost behind value-only headlines.
What changed, measured.
Not what felt different.
Pre-Aurora state cited from L01 evidence chain. Post-Aurora state observed in the 2-week pilot window (2026-06-02 → 06-13). Every delta is citable.
AI BVE isn't just dollars.
Intangibles get attestation cadence too.
Six intangible-value items captured at baseline; three pilot-validated, three pending instrumented attestation. Each carries a named authority and cadence — so the value claim doesn't quietly disappear when the spotlight moves on.
Continuous observability.
Scheduled measurement windows.
The engagement now lives under ADW with scheduled quarterly readouts. Every readout produces a fresh scorecard, fresh variance classification, and fresh four-party attestation — or surfaces as an ADW alert if material drift is detected between scheduled windows.
The outcomes above are credible because the mechanics below are structurally enforced.
Baseline lock at engagement start · variance classified faithfully · four-party joint attestation per readout · continuous observability post-deploy.
The projection L04 measures against — captured before the build started.
Per Structural Rule 1: at L01 signoff, every promoted workstream's value_score block is pulled verbatim from the dossier into L04_BASELINE.yaml, the success metric is pulled verbatim from the L02 charter, and the file is hash-signed. The body of the baseline does not change during readouts. Revision requires a documented change order with prior-hash → new-hash transition and four-party re-attestation.
L04_BASELINE · v1 · sha256:demo-l04-baseline-…7f3c1b9e
Three variances. Each one its own category.
Mutual exclusion enforced via decision tree.
Per Structural Rule 6: every variance enters exactly one category. The classification path (workflow → spec → build → model → adoption → assumption → macro) is recorded for each entry. No free-text rationale can contradict the category.
At-risk ARR pool grew during the Q3-Q4 cohort window.
Projection (L01 baseline): $3.8M ARR-at-risk pool, per L01 src_slack05 T_DM_001:4 (Riya 2026-05-29 DM to Devon).
Realized (pilot observation): $4.62M observed exposure in the 2-week pilot cohort. Delta +$820K (+21.6%) above projection.
Why positive direction: A larger pool is a larger addressable opportunity, not a larger problem. Treatment: Re-baseline at 30-day readout via documented change order (opt-01).
The AI driver-attribution + dispatch combination materially exceeded the L01 conservative attribution split.
Projection (pilot pro-rata): Combined WS-002 + WS-003 → ~$100K of attributable retained ARR/margin for the 2-week window.
Realized: $2.94M ARR retained — the pilot captured 77% of the full L01 projected ARR-at-risk addressable in 2 weeks.
⚠ Why this is NOT "L01 sandbagged": The L01 projection deliberately distributed conservatively. The realized number is not the projection rewritten — the projection's conservatism is the explanation for why the realized is higher. Treatment: Conservative pilot extrapolation discipline through quarterly readouts (opt-02 — the IP-defensibility recommendation).
ADR-016 type-level structural enforcement exceeded aspirational spec.
Projection (L02 charter): Behavioral requirement that external customer-facing actions require human confirmation.
Realized (L03 build): Type-level structural enforcement — human_confirmation: HumanConfirmation is a required parameter on every external-action function signature; PermissionError raised on confirmed=False. Forbidden-pattern grep confirms no bypass paths.
Pre-attestation audit confirmed zero reverse-fit narratives detected. Auditor signature auditor-pilot-v1-clean attached to readout body. Every variance entry cites: (a) the projection field, (b) the realized observation, (c) the rubric criterion applied.
Four parties. One readout. One hash.
Per Structural Rule 3: every readout requires four-party attestation. CFO (commercial economics) · CRO (commercial go-to-market) · CEO (strategic) · Northbeam (methodology). All four signatures, or no readout ships. Each signature is hash-locked to the readout body.
The discipline beat — what the attestation refuses to claim.
The four signers are deliberately precise about what is and is not attested. This is the credibility statement.
Three forward actions. Each with owner, window, and blocking status.
Re-baseline at_risk_arr_pool at thirty_day readout (v-01)
At the 30-day readout, query the full live Snowflake cohort and re-baseline the at-risk ARR pool to the observed $4.62M value. Per Structural Rule 1, this requires a documented change order recorded in L04_BASELINE.yaml § baseline.baseline_revisions[] with prior hash, new hash, and four-party joint re-attestation. No silent re-baselining. The success-metric arithmetic at Q4 FY26 close depends on a stable denominator.
Hold pilot extrapolation conservatively (v-02)
The pilot's strong positive variance is exciting but unstable — subsequent cohorts will normalize because the most-at-risk accounts engage first. Annualizing the pilot rate now and then showing normalization at the 90-day readout creates a board-confidence problem you don't want to manage. Conservative extrapolation protects the credibility of every subsequent readout to the board. The disciplined narrative: "Pilot captured 77% of the L01 ARR-at-risk addressable in 2 weeks; 30-day and 90-day readouts will calibrate the steady-state rate."
Production-credential confirmation
Surface Hightouch + Gainsight + Snowflake + Salesforce + Slack production credentials by Wave-1 deploy. Without resolution by 30-day readout, ~47 measurements remain deferred AND Risk R8 materializes. L04's structural integrity rests on honest-deferral discipline (Structural Rule 4). Honesty is sustained only if the credential gap is actively chased. Quarterly readouts on deferred measurements force the question: "Why hasn't instrumentation been built?"
Continuous structural observability.
ADW extends past the L04 readout into continuous post-deploy observation. Every L02 ADR with a structural enforcement clause becomes a continuous check. Every L01 shadow-IT gap becomes a re-emergence check. Alerts are themselves citation-anchored — an alert without an evidence ref is rejected.
7 ADR-compliance checks
Shadow-IT re-emergence + alert routing
Four layers. Four hashes.
One audit-grade chain.
Every layer hand-signed an artifact to the next. Every hash verified at the downstream phase entry, or the downstream layer halted before code landed. The chain that started at L01 with seven workstreams and a synthesized current-state map closes at L04 with a four-party attested readout — and continues running under ADW until the success-metric measurement window.
Four layers. One signed chain.
See the full methodology in motion.
L01 surfaces the real workflow. L02 turns it into a binding spec. L03 ships the system with an independent verifier on every commit. L04 measures realized value against the projection. The master cinematic walks the chain end-to-end.