Demonstration Material A walkthrough of the real use of Aurora on synthetic data.
Layer 04 · AI Business Value Engineering
See it Work · sample run · synthetic data

Prove.
Weeks of value measurement now automated and repeatable.

Every variance classification, attestation signature, and dollar figure on this page comes from an actual run of the Aurora L04 Prove automation against the locked baseline captured at L01 signoff. Layer 04 measures realized value against the projection L01 made — before the build started. Every delta is classified across seven categories. The result is a numerical readout against a signed baseline, not a slide deck made the week before.

$3.0M
Net pilot-window value
4 / 4
Signers attested
0
Reverse-fit narratives
7 / 7
ADW checks armed
90-second cinematic — keyboard: Space pause/play · ← → step · R restart · F fullscreen
Source: Keystone-2026Q2 L04 baseline + pilot readout · attested 2026-05-23 · baseline hash sha256:demo-l04-baseline-…7f3c1b9e

Most AI programs cannot tell you what they actually delivered.
The methodology makes the answer structural.

Four things go wrong with AI value measurement after a program ships. Aurora's L04 layer makes each of them impossible to do quietly.

The projection drifts to match the outcome.

The pre-build value claim quietly gets revised every quarter to match what actually happened. Every readout shows green. Aurora locks the baseline at L01 signoff, hash-signs it, and refuses silent revision. Re-baselining requires a documented change order with four-party re-attestation.

Variance gets narrated away.

"It seems to be working" becomes the executive narrative. "Mixed signals" becomes the variance explanation. Aurora classifies every delta against a seven-category rubric with mutual exclusion. Each variance carries its decision-tree path on the record. Free-text rationale cannot override the category.

Positive variance gets reverse-fit into "we crushed it."

A +$2.84M positive variance tempts the team to annualize the pilot into $76M/yr. Aurora's Structural Rule 2 refuses: variance is explained *against the projection*, not by re-shaping the projection. The independent auditor confirms zero reverse-fit narratives before any readout signs.

One person signs off on the value claim.

The CRO signs off on outcomes. CFO never validates the economics. CEO never reviews the variance classification. Aurora requires four-party joint attestation per readout: CFO (economics) + CRO (GTM) + CEO (strategic) + Northbeam (methodology). All four signatures, or no readout ships.

$3.0M net pilot value
at $63K/yr AI cost overhead.

Aurora discloses what the AI loops cost. The fully-loaded overhead is structurally honest because the methodology refuses to hide infrastructure cost behind value-only headlines.

$3.0M
Net pilot-window value
$3,002,923
Gross $3,007,769 · less pilot-window overhead pro-rata $4,846
Overhead category
Annual
Tokens
$6,000
Infra
$24,000
Ops
$15,000
Model maintenance
$10,000
Compliance
$8,000
Total annual AI cost overhead
$63,000

What changed, measured.
Not what felt different.

Pre-Aurora state cited from L01 evidence chain. Post-Aurora state observed in the 2-week pilot window (2026-06-02 → 06-13). Every delta is citable.

Dimension
Pre-Aurora (L01-cited)
Post-Aurora (pilot observed)
Delta
Customer-health definitions
4 parallel definitions (KCHS v2 / Riya Notion / Diego SF / Priya engagement)
1 canonical, hash-signed
−75%
KCHS null-input rate
3 of 5 inputs returning nulls for affected population
0 nulls post-WS-005 repair (12 accounts backfilled)
−100%
False-positive page rate
~30% (Riya operator estimate)
12% auto-suppressed by ADR-017 gate before reaching CSM
−83% reaching CSM
Canonical-dashboard viewership
6 unique / 90 days (4 of those = Riya confirming brokenness)
Replaced by canonical query layer feeding board memo
retired
At-risk ARR addressable
$3.8M (L01-cited)
$4.62M observed in Q3-Q4 pilot cohort (+21.6%)
+$820K
Customer-health decisioning loop
Operator-mediated (Devon manual suppression, Sarah uses Devon's list, Diego runs separate forecast)
End-to-end AI loop: canonical signal → driver attribution → validated dispatch → CSM action → save tracking → board memo
mechanism live
Q1 board memo defensibility
"Diego says 11.4%, Sarah says 14.1% — Lin doesn't know which" (Tom Q12 admission)
Single canonical query produces one number with hash + assumption tree + audit-review sign-off
defensible
Build-partner bus-factor
1 (Riya sole maintainer of canonical query path)
2 (Riya + Marcus Patel per ADR-004 with KT milestones)
1 → 2

AI BVE isn't just dollars.
Intangibles get attestation cadence too.

Six intangible-value items captured at baseline; three pilot-validated, three pending instrumented attestation. Each carries a named authority and cadence — so the value claim doesn't quietly disappear when the spotlight moves on.

ID
Item
Pilot status
Cadence
iv-01
Sarah↔Diego trust recovery (ADR-003 disclosure protocol)
✓ pilot-validated
quarterly
iv-02
Riya bus-factor risk reduction (ADR-004 + Marcus backstop)
✓ pilot-validated
monthly → Q
iv-03
Board defensibility moment (June 18 memo with canonical number)
✓ shipped
one-time → Q
iv-04
Devon shadow-suppression workflow retired
⏸ deploy+30d
one-time → Q
iv-05
CS team operational confidence (driver-informed intervention)
⏸ Q3 NPS survey
quarterly
iv-06
Sales↔CS coordination friction reduction
⏸ Q3 cross-functional pulse
quarterly

Continuous observability.
Scheduled measurement windows.

The engagement now lives under ADW with scheduled quarterly readouts. Every readout produces a fresh scorecard, fresh variance classification, and fresh four-party attestation — or surfaces as an ADW alert if material drift is detected between scheduled windows.

2026-05-23 · 03:00Z
First ADW daily scanautomated · checks all 7 ADR-compliance + 3 shadow-IT re-emergence states
2026-06-14
Wave-1 deploy + production-credential confirmation targetowner Tom + Bill · opt-04 critical-path gate · 47 deferred measurements depend on this
2026-06-18
Board memo presented to Board of Directorsowner Lin + Maya · L01-evidence-anchored canonical churn number
2026-06-21
Devon shadow-suppression retirement attestation windowowner Devon + Sarah · deploy+7d grace · iv-04 close
2026-07-05
Wave-2 L03 deploy target (WS-007 · CS Ops self-serve maturity)owner Tom + Sarah · single workstream · ~3-4 commits
2026-07-15
thirty_day L04 readoutall 4 parties · MUST close opt-01 (re-baseline change order) + opt-04 (credentials)
2026-07-26
Wave-3 L03 deploy target (WS-006 · process docs + drift-watch)owner Tom
2026-09-15
ninety_day L04 readoutall 4 parties · steady-state rate calibration window
2027-01-15
Q4 FY26 close · quarterly L04 readout · success-metric measurement windowLin + Bill joint methodology attestation · success metric measured against engagement-start baseline per AMB-026 (trailing-12-months)

The outcomes above are credible because the mechanics below are structurally enforced.

Baseline lock at engagement start · variance classified faithfully · four-party joint attestation per readout · continuous observability post-deploy.

The projection L04 measures against — captured before the build started.

Per Structural Rule 1: at L01 signoff, every promoted workstream's value_score block is pulled verbatim from the dossier into L04_BASELINE.yaml, the success metric is pulled verbatim from the L02 charter, and the file is hash-signed. The body of the baseline does not change during readouts. Revision requires a documented change order with prior-hash → new-hash transition and four-party re-attestation.

L04_BASELINE · v1 · sha256:demo-l04-baseline-…7f3c1b9e

UPSTREAML01 dossier — sha256:e6518d6e… · 7 workstreams promoted
UPSTREAML02 charter — sha256:dcb07909… · 17 ADRs · 40 acceptance criteria
METRICsuccess_metric · 2pt mid-market churn reduction · Q4 FY26 close
POOLpre_aurora_baseline.at_risk_arr_pool_usd · $3,800,000
FORCINGJune 18, 2026 board memo · Lin Zhao + Bill Tennant joint methodology attestation
ATTESTED4 of 4 signers · 2026-05-23T01:30:00Z · baseline body locked

Three variances. Each one its own category.
Mutual exclusion enforced via decision tree.

Per Structural Rule 6: every variance enters exactly one category. The classification path (workflow → spec → build → model → adoption → assumption → macro) is recorded for each entry. No free-text rationale can contradict the category.

v-01 · assumption · positive · medium severity
+$820K addressable

At-risk ARR pool grew during the Q3-Q4 cohort window.

Projection (L01 baseline): $3.8M ARR-at-risk pool, per L01 src_slack05 T_DM_001:4 (Riya 2026-05-29 DM to Devon).
Realized (pilot observation): $4.62M observed exposure in the 2-week pilot cohort. Delta +$820K (+21.6%) above projection.

Decision-tree path: workflow? no · spec? no · build? no · model? no · adoption? no · assumption? YES — L01 baseline carried the assumption "ARR-at-risk pool is materially stable at ~$3.8M." Q3-Q4 cohort observed a larger pool. Per rubric: "an L01 confidence_interval.assumptions[] entry proved false" → classify as assumption.

Why positive direction: A larger pool is a larger addressable opportunity, not a larger problem. Treatment: Re-baseline at 30-day readout via documented change order (opt-01).

v-02 · model · positive · critical magnitude
+$2.84M pilot pro-rata

The AI driver-attribution + dispatch combination materially exceeded the L01 conservative attribution split.

Projection (pilot pro-rata): Combined WS-002 + WS-003 → ~$100K of attributable retained ARR/margin for the 2-week window.
Realized: $2.94M ARR retained — the pilot captured 77% of the full L01 projected ARR-at-risk addressable in 2 weeks.

Decision-tree path: workflow? no · spec? no (L02 acceptance criteria satisfied as written) · build? no (L03 73/73 tests passing) · model? YES — WS-002 attribution + WS-003 dispatch combined at the upper edge of L01's confidence interval, AND L01's conservative per-workstream attribution distributed value cross-workstream in a way that under-attributed the combined AI loop. Adoption considered as enabler — but per mutual-exclusion: classify at the locus where value materialized → model.

⚠ Why this is NOT "L01 sandbagged": The L01 projection deliberately distributed conservatively. The realized number is not the projection rewritten — the projection's conservatism is the explanation for why the realized is higher. Treatment: Conservative pilot extrapolation discipline through quarterly readouts (opt-02 — the IP-defensibility recommendation).

v-03 · build · positive · low severity
qualitative

ADR-016 type-level structural enforcement exceeded aspirational spec.

Projection (L02 charter): Behavioral requirement that external customer-facing actions require human confirmation.
Realized (L03 build): Type-level structural enforcement — human_confirmation: HumanConfirmation is a required parameter on every external-action function signature; PermissionError raised on confirmed=False. Forbidden-pattern grep confirms no bypass paths.

Decision-tree path: workflow? no · spec? no (the spec was correct as written) · build? YES — L03 implemented the spec at a stronger structural level than required. Treatment: noted; positive build variance with no remediation required.

Pre-attestation audit confirmed zero reverse-fit narratives detected. Auditor signature auditor-pilot-v1-clean attached to readout body. Every variance entry cites: (a) the projection field, (b) the realized observation, (c) the rubric criterion applied.

Four parties. One readout. One hash.

Per Structural Rule 3: every readout requires four-party attestation. CFO (commercial economics) · CRO (commercial go-to-market) · CEO (strategic) · Northbeam (methodology). All four signatures, or no readout ships. Each signature is hash-locked to the readout body.

Lin Zhao
CFO · Commercial economics
"I attest to the realized economics. I accept that the $2.94M ARR-retained is the pilot-window observation and is NOT defensibly annualizable yet. I see and acknowledge the 5 deferred_pending_real_systems entries."
Diego Martinez
CRO · Commercial GTM
"Sales↔CS coordination outcomes reflect post-ADR-003 mediation: zero broken-data escalations reached me or Alex in the pilot window. I accept variance v-02 classification as model (AI at upper edge), not adoption."
Maya Chen
CEO · Strategic
"This pilot readout reflects faithful measurement against the L04 immutable baseline. I authorize opt-01 through opt-04 to feed Wave-2 + Wave-3 build decisions and the 30-day readout."
Bill Tennant
Northbeam · Methodology
"Three variances classified deterministically. The aggressive positive v-02 magnitude is explained against the L01 conservative split rather than re-fit. Conservative extrapolation discipline (opt-02) is the methodology integrity statement."

The discipline beat — what the attestation refuses to claim.

The four signers are deliberately precise about what is and is not attested. This is the credibility statement.

NOTthe pilot save rate (63.7%) will sustain across subsequent cohorts
NOTthe $2.94M ARR retained is annualizable
NOTthe 5 deferred measurements have been measured (they're explicitly deferred)
NOTthe 2pt churn-reduction success metric has been hit (measured at Q4 FY26 close)
Attesting:the mechanism is live, the variances are classified faithfully, the trajectory framing for the board is grounded in measurement.

Three forward actions. Each with owner, window, and blocking status.

opt-01 · assumption variance · BLOCKING
owner: Tom + Lintarget: 30-dayBLOCKING

Re-baseline at_risk_arr_pool at thirty_day readout (v-01)

At the 30-day readout, query the full live Snowflake cohort and re-baseline the at-risk ARR pool to the observed $4.62M value. Per Structural Rule 1, this requires a documented change order recorded in L04_BASELINE.yaml § baseline.baseline_revisions[] with prior hash, new hash, and four-party joint re-attestation. No silent re-baselining. The success-metric arithmetic at Q4 FY26 close depends on a stable denominator.

opt-02 · model variance · BOARD CREDIBILITY
owner: Bill + Sarahtarget: 30-day → quarterly

Hold pilot extrapolation conservatively (v-02)

The pilot's strong positive variance is exciting but unstable — subsequent cohorts will normalize because the most-at-risk accounts engage first. Annualizing the pilot rate now and then showing normalization at the 90-day readout creates a board-confidence problem you don't want to manage. Conservative extrapolation protects the credibility of every subsequent readout to the board. The disciplined narrative: "Pilot captured 77% of the L01 ARR-at-risk addressable in 2 weeks; 30-day and 90-day readouts will calibrate the steady-state rate."

opt-04 · deferred measurements · BLOCKING
owner: Tom + Billtarget: by 2026-06-14SINGLE LARGEST GATE

Production-credential confirmation

Surface Hightouch + Gainsight + Snowflake + Salesforce + Slack production credentials by Wave-1 deploy. Without resolution by 30-day readout, ~47 measurements remain deferred AND Risk R8 materializes. L04's structural integrity rests on honest-deferral discipline (Structural Rule 4). Honesty is sustained only if the credential gap is actively chased. Quarterly readouts on deferred measurements force the question: "Why hasn't instrumentation been built?"

Continuous structural observability.

ADW extends past the L04 readout into continuous post-deploy observation. Every L02 ADR with a structural enforcement clause becomes a continuous check. Every L01 shadow-IT gap becomes a re-emergence check. Alerts are themselves citation-anchored — an alert without an evidence ref is rejected.

7 ADR-compliance checks

✓ PASSADR-001 · no parallel-definition references in production
✓ PASSADR-003 · disclosure protocol before any external publication
✓ PASSADR-006 · canonical YAML hash matches runtime on every refresh
✓ PASSADR-008 · no opaque-ML imports (torch/tensorflow/keras/transformers)
✓ PASSADR-015 · no bare-LLM calls in narrative synthesizer
✓ PASSADR-016 · human-confirmation on every external action
✓ PASSADR-017 · no validation-gate bypass paths

Shadow-IT re-emergence + alert routing

✓ PASSRiya's Notion read-only + redirect to canonical
✓ PASSDiego SF renewal model deprecated · folded into canonical
⏸ DEFERDevon suppression retirement (deploy+30d window)
Alert routing · 4-tier
info
watch
material
critical
ADR violations + shadow-IT re-emergence = always material-tier or higher, regardless of variance threshold. Per Structural Rule 7, the system cannot quietly tolerate a pattern the team explicitly rejected.

Four layers. Four hashes.
One audit-grade chain.

Every layer hand-signed an artifact to the next. Every hash verified at the downstream phase entry, or the downstream layer halted before code landed. The chain that started at L01 with seven workstreams and a synthesized current-state map closes at L04 with a four-party attested readout — and continues running under ADW until the success-metric measurement window.

Four layers. One signed chain.
See the full methodology in motion.

L01 surfaces the real workflow. L02 turns it into a binding spec. L03 ships the system with an independent verifier on every commit. L04 measures realized value against the projection. The master cinematic walks the chain end-to-end.

Discuss a Custom Engagement →