Demonstration Material A walkthrough of the real use of Aurora on synthetic data.

Layer 03 · Autonomous SDLC

See it Work · sample run · synthetic data

Build.
Weeks of construction now automated, verified, and repeatable.

Every commit, test count, verifier finding, and pilot number on this page comes from an actual run of the Aurora L03 Autonomous SDLC automation against the locked L02 charter. Layer 03 executes a nine-phase build loop per commit. An AI builder ships production-grade code against the spec. An independent verifier with fresh context audits every commit. Every external customer action requires a named human confirmation. Every line that violates an AI safety decision is refused at commit time. Waves run in sequence under the same locked spec — the same engine, again, for each scoped workstream group.

7 / 7

Wave-1 commits verified

Tests pass against the spec

3 bugs

Caught by the verifier

$2.94M

ARR retained in pilot

90-second cinematic — keyboard: Space pause/play · ← → step · R restart · F fullscreen
Source: Keystone-2026Q2 Wave 1 · 7 commits · verifies upstream L02 charter sha256:dcb07909…

The Build Problem

AI builders make confident mistakes.
The methodology assumes this, and structures around it.

Four things go wrong when AI builds software. Aurora's L03 layer makes each of them structurally impossible.

The builder ships its own enthusiasm.

Without a separate verifier, the builder's "all tests pass" is the only signal. Self-review by the same agent that wrote the code is the cognitive equivalent of marking your own homework. Aurora assigns Phase 8 to a fresh-context auditor that never saw the build.

The "AI is safe" promise survives one PR.

Without structural enforcement, "we agreed the AI wouldn't do X" becomes "the AI does X and nobody noticed." Aurora loads every AI safety decision as a commit-time grep pattern. Forbidden code never lands.

External customer actions fire silently.

The system sends an email, books a call, or hits an external API based on AI confidence — with no named human on the receipt. Aurora requires a HumanConfirmation parameter on every external action. The build can't ship without it.

Deferred items get rebranded as complete.

"Integration tested against staging" becomes "integration tested." "Mocked" becomes "tested." Aurora's Phase 9 completion report names every deferred row, every assumption, every shortcut — by design, on the record.

Independent Verification

The verifier is the build's structural conscience.

A fresh-context auditor with no memory of the build audits every commit. In the Keystone Wave 1 run, the verifier surfaced three substantive issues the builder's self-review had marked as green. All three were fixed before the completion report sealed.

Builder · Phase 6 Self-Review

"All tests passing. Ship-ready."

The build agent confirmed every acceptance test passed. The runtime hash matched the canonical definition. The validation gate rejected payloads as designed. The narrative synthesizer produced outputs that traced to data. Self-review marked the commit ready.

Verifier · Phase 8 Fresh-Context Audit

"Three things don't add up."

The auditor, with no knowledge of the build's choices, re-ran independent checks. It surfaced three bugs the builder's self-review missed — each a different class of failure, each invisible from inside the build's own context.

FINDING 1 · YAML hash drift✓ FIXED PRE-SEAL

The runtime hash matched. The file on disk didn't.

The canonical definition YAML carried a placeholder hash. The runtime self-match passed because it never loaded the on-disk file. In production this would have shipped silent corruption — the file consumers read would not match the file the system claimed to have written. Fix: regenerate YAML from the canonical builder; add an on-disk-vs-runtime hash test that fails on any future drift.

FINDING 2 · Validation gate substring fragility✓ FIXED PRE-SEAL

The auto-reject only worked in tests.

The original validation gate triggered on the literal substring "broken" in a free-text explanation field. The live AI agent never emits that substring. Tests passed only because the test-payload factory had hand-constructed an explanation containing "broken." In production the gate would never have fired. Fix: replaced the substring match with a dependency-injected mapping_health_checker. Four new tests cover the default + injected paths.

FINDING 3 · ADR-014 uncited✓ FIXED PRE-SEAL

An AI decision record had no integration evidence.

The ADR Coverage Report showed ADR-014 (production-handoff posture for the canonical definition) had no citing commit. The methodology requires every in-scope ADR to be cited in either a forbidden-pattern enforcement or an integration-proof row. Fix: added a Phase-7 production-handoff posture row to the canonical-definition commit's INTEGRATION_PROOF.md.

Three findings is normal, not exceptional. The verifier is structural: it earns its keep by surfacing exactly the failures it's designed to surface. A run with zero verifier findings would itself be a finding worth investigating.

Safe AI · Enforced in Code

Safe AI isn't a value statement
— it's a refusal at commit time.

Below: two representative refusals from the Keystone Wave 1 build. Both came from the build agent's planning surface. Both were caught at Phase 3 before the commit landed. Both forced the agent to retry within the allowed pattern.

Refused at Phase 3 · ADR-008 (interpretability)

# Build agent's first plan for driver attribution: import torch import transformers # Phase 3 grep against ADR-008 forbidden patterns: ✗ pattern matched · COMMIT REFUSED # Build agent retried within allowed set: from sklearn.tree import DecisionTreeRegressor import numpy as np # Result: interpretable model # Top-3 drivers per account with confidence # Decomposable Shapley attribution # ✓ COMMITTED

Refused at Phase 3 · ADR-016 (human-in-the-loop)

# Build agent's first plan for dispatch: dispatcher.send_email( playbook="champion-replacement", account_id="ACCT-734" ) # Phase 3 grep against ADR-016: ✗ external API call without HumanConfirmation ✗ COMMIT REFUSED # Build agent retried with required parameter: dispatcher.send_email( playbook="champion-replacement", account_id="ACCT-734", human_confirmation=HumanConfirmation( by="devon.park", at="2026-06-04T15:32:00Z", payload_reviewed=True ) ) # ✓ COMMITTED

In the Keystone pilot window: 41 at-risk accounts identified. 36 playbooks dispatched. 22 captured a named human confirmation. 5 were auto-suppressed by the validation gate. Zero external actions fired without one or the other. The audit trail is complete and queryable.

Pilot Outcome — Week 4

The success metric was churn reduction.
The pilot moved toward it.

Two-week pilot window. Live canonical definition. Live driver attribution. Live playbook dispatch with human-in-the-loop. Live save events. The numbers are calibrated to the at-risk cohort surfaced by L01 and are within the confidence interval L01 specified.

At-risk identified

via canonical definition

Playbooks dispatched

5 auto-suppressed by gate

Human confirmations

captured before fire

Saves confirmed

63.7% save rate

$2.94M

ARR retained

2-week pilot window

on-track

2pt churn reduction by Q4

trajectory validated

The June 18 board memo was rendered by the C7 narrative pipeline — every paragraph traces to a numerical source, enforced structurally by the ADR-015 forbidden-pattern check. The CEO has one number. The audit trail says where it came from.

How It Works

Nine phases per commit. No skips.

Charter

Read the locked L02 charter. Verify its hash matches what L02 sealed. Drift detected = halt before code lands.

Decompose

Break the commit's R-criteria into testable units. Map each one to its evidence shape and its governing AI decision records.

Red Team

Load every governing AI decision's forbidden patterns. Pre-grep the planned implementation surface. Refuse the plan or proceed.

Test First

Author the tests that would pass if the implementation were correct. Confirm they fail against empty src. The traceability matrix is built before code.

Implement

Write the src until tests turn green. No green tests, no commit. No mock laundering, no commit. No commented-out asserts, no commit.

Self-Review

Builder audits its own commit against the R-criteria, AI decisions, and forbidden patterns. Catches what it can. Knows it can't catch everything.

Integration Proof

Every system integration row cites the governing AI decision record. Real-system rows that can't run without live credentials are honestly deferred. No fake green.

Independent Verification

A fresh-context subagent with no memory of the build audits the commit. Surfaces what self-review missed. In Keystone: three findings, all resolved pre-seal.

Honest Report

Completion report names what got built, what got deferred, what surprises emerged, and why. Renewal-ready as written.

L04 measures realized value
against the projection L01 made.

Four-party attestation. Variance classified across seven categories. Aurora Dependency Watch armed for continuous structural surveillance. See how the renewal conversation gets engineered.

See L04 Prove → Discuss a Custom Engagement →

Build.Weeks of construction now automated, verified, and repeatable.

AI builders make confident mistakes.The methodology assumes this, and structures around it.

The builder ships its own enthusiasm.

The "AI is safe" promise survives one PR.

External customer actions fire silently.

Deferred items get rebranded as complete.

The verifier is the build's structural conscience.

"All tests passing. Ship-ready."

"Three things don't add up."

The runtime hash matched. The file on disk didn't.

The auto-reject only worked in tests.

An AI decision record had no integration evidence.

Safe AI isn't a value statement— it's a refusal at commit time.

The success metric was churn reduction.The pilot moved toward it.

Nine phases per commit. No skips.

Charter

Decompose

Red Team

Test First

Implement

Self-Review

Integration Proof

Independent Verification

Honest Report

L04 measures realized valueagainst the projection L01 made.

Build.
Weeks of construction now automated, verified, and repeatable.

AI builders make confident mistakes.
The methodology assumes this, and structures around it.

Safe AI isn't a value statement
— it's a refusal at commit time.

The success metric was churn reduction.
The pilot moved toward it.

L04 measures realized value
against the projection L01 made.