Working With Agents — Our Spec-First Practice

First-Class Citizens

Beyond code — documents go first-class.

Code was always the one artifact we version, review, and trust as the source of truth. Now these documents earn the same standing:

High-level specthe what & why

API specthe service contract

UX specthe interaction contract

Technical specthe how, per platform

Multi-level spec stack

Top-down, traceable back.

derives

traces back

Intent

The problem to solve — a bug, a request, a refactor

From Jira / Confluence / Redmine

High-level spec

openspec · proposal.md · design.md

What & why — behavior and requirements

API spec

OpenAPI schema

API contract — frontend and backend build to it

UX spec

markdown — screens, layouts, interactions, flows

+ HTML prototype

Interaction contract — screens and behavior

→ lives in references/

Technical specs (in openspec format)

Backend

Web

iOS

Android

—

Code & tests

Derived — and traceable back to every layer

Example · Site Management

What the artifacts actually look like.

One real feature in openspec — proposal, contracts, UX, and per-endpoint specs, all traceable.

High-level spec

openspec · site-management-for-portal/

proposal.md

specs/site-hierarchy/spec.md

specs/site-permissions/spec.md

specs/site-device-filtering/spec.md

specs/floor-plan-display/spec.md

specs/view-module/spec.md

API spec

OpenAPI · api-ssot

Site CRUD endpoints

Device ↔ site assignment

Permission scopes

the contract both ends build to

UX spec

artifacts/

ux-spec.md

prototype.html

filter-combo/

device-picker-dialog/

tree-view-list/

key-screens/ ×13

Technical spec

openspec · per screen / endpoint

spec.md

proposal.md

design.md

tasks.md

↓ code + tests

filter-combo/ · prototype.html

device-picker-dialog/ · prototype.html

Team Workflow · Stage 01

Clarify intent

PO · PM · RD · UX · Architect·no AI

HumanPO presents the problem & contextframing

→

HumanTeam discusses to understand itfacilitation

→

HumanAgree scope: in / outscoping

→

OutputClarified intentagreed scope

project bglore

misc knowledge

constraints

Referencessuperpowers · AI Fluency — Delegation · Anthropic

AI Fluency

DelegationDescriptionDiscernmentDiligence

In practice · Stage 01

Clarify intent

Site Management · in development

used brainstorming · probing the PM — FE · BE · UX, no agent yet

PMasks

“Build a Site module.” A feature request handed to the team — a solution, not yet a problem.

FE · BE · UXprobe

Pushed back on the PM — what are we actually trying to solve?don't build the feature as stated; find the problem behind it

FE · BE · UXbrainstorm

With the real problem clear, brainstormed the direction together across front end, back end, and UX.

Teamagree

Aligned on the problem and a rough scope — what's in, what's out (incl. breaking removals).

Outputclarified intent

The team owns the problem, not just the ticket. No agent has touched it yet — this is the human gate.

The real problem · dug out by probing

P1 Devices aren't tied to a physical site P2 Management complexity grows with device count P3 No inventory basis when expanding / replacing P4 Monitoring vs. management roles blur at scale P5 No site-level base for future features

Agreed scope · in / out

IN unified Site module · site-first workflow multi-level Site › Subsite · single assignment OUT · breaking View: Edit site Device: Create site, Add device

Team Workflow · Stage 02

Write the spec

PM · RD+coding agent

HumanHand off the refined intenthandoff

→

AgentFormalize into openspecopenspec workflow

→

AgentDraft proposal · design · specsextract-behavior

→

HumanValidate & PR reviewspec review

→

OutputHigh-level specproposal.md · design.md · specs/

clarified intent

conventionsproject bg

architecturethe codebase

conventions

ReferencesOpenSpec · AI Fluency — Description · Anthropic

AI Fluency

DelegationDescriptionDiscernmentDiligence

In practice · Stage 02

Write the spec

Site Management · in development

used openspec workflow · adversarial review — the whole team shaped it with the agent, then hunted for gaps

PMinput

A high-level request, not a spec — a Confluence page.“Site / Subsite — High Level Spec for AI-coding” v0.9

Team ⇆ Agentshape together

PM · RD · UX · Architect shape it into openspec format together with the agent — proposal + capabilities + scenarios, gated by openspec validate.

Agentadversarial

Ran an adversarial review — the reviewer must find issues, “looks good” not allowed — hunting the spec for conflicts, missing constraints, undefined edge cases.

Team ⇆ Agentrevise ×5

Every gap became a real change to the specs, before any code:

+ error-handling + Google Maps fix conflicts, add constraints rename Subsite → Area map: center-pin drag device-move warning

Outputsource of truth

A validated openspec change — 8 capabilities, 2 breaking module changes. Code follows this, not the Confluence page.

proposal.md · ## Why

Currently, site management is fragmented across View and Device modules, leading to inconsistent experiences and duplicated functionality. A dedicated Site module will provide a single source of truth for organizational hierarchy…

specs/site-hierarchy/spec.md

### Requirement: Site hierarchy structure The system SHALL support Site > Area (L1–L4) > Device. #### Scenario: Prevent exceeding depth WHEN user adds an Area under a Layer-4 Area THEN system rejects — max depth reached

ReferencesOpenSpec · Adversarial Review · BMAD-Method

Team Workflow · Stage 03

Break down

RD · UX+coding agent

BaseHigh-level spec

RD · AgentDerive the API contractshape & validate / OpenAPI

UX · AgentDraft UX spec + HTML prototypereview the flow / headless UI

OutputContractsAPI spec · UX spec · HTML prototype

API design principlesexisting contracts

design referencescomponent library

ReferencesOpenAPI · API Design Principles

AI Fluency

DelegationDescriptionDiscernmentDiligence

In practice · Stage 03

Break down (1)

Floor Plan · shipped

used hadlc-breakdown (api-designer · openspec-prototype)

Base The floor-plan high-level spec, from Stage 02.

RD ⇆ Agent · API contract

Derive the OpenAPI

floor-plan CRUD + device-position / FOV — the shape both ends build to

⇆

UX ⇆ Agent · prototype + UX

Build the clickable prototype

the screens as something you can click — the tree and the canvas

built in parallel — the prototype surfaces what the API needs; the contract shapes the prototype

↺ and it loops back up — half-way through, the contracts can reveal the high-level spec itself is wrong. Change the spec, then re-derive.

Output API spec · UX spec · clickable prototype — reconciled and reviewable before a line of code.

API contract · NEW · OpenAPI · api-ssot

GET POST /sites/{siteId}/floor-plans GET PATCH DELETE /floor-plans/{id} POST /floor-plans/{id}/upload GET POST /floor-plans/{id}/device-positions the API spec that didn't exist before

UX spec · floor-plan screens

· Upload floor-plan image · Place devices on the plan · Set camera FOV & direction · Folder overview (recursive thumbnails)

In practice · Stage 03

Break down (2)

Site Management · in development

used web-prototype-builder-vue — the prototype kept moving, round after round against the Figma frames

BaseUX spec

The hard screens to prototype — the filter toolbar and the device picker.

UX ⇆ Agentbuild

Scaffolded the interactions — toolbar, filter menus, trigger pipeline, empty and lazy-load states.

UX ⇆ Agentalign to Figma

Round after round against the Figma frames; tokenized to the VORTEX 1.0 design system.

UX ⇆ Agentsync with shipped

Matched the prototype to the shipped behaviour, across all three toolbar variants.

Outputsigned off

20+ revisions before it matched — a working screen the team clicks and signs off, not a static mockup.

filter-combo/ · clickable prototype.html

device-picker-dialog/ · clickable prototype.html

Team Workflow · Stage 04

Build

RD+coding agent

AgentGenerate per-endpoint technical specopenspec

→

HumanReview the technical specapi-and-interface-review

→

AgentImplement with testsTDD

→

HumanCode reviewcode-reviewer

→

OutputWorking codetechnical spec · code + tests

API + UX specthe codebase

API + UX specconventions

the codebaseteststech stack

conventions

Referencesagent-skills — code-reviewer · AI Fluency — Discernment · Anthropic

AI Fluency

DelegationDescriptionDiscernmentDiligence

In practice · Stage 04

Build (1)

Floor Plan · shipped

used openspec proposal → apply · TDD · code-reviewer

Basefrom breakdown

The floor-plan API contract + UX spec, handed down from Stage 03.

Agentopenspec proposal

Ran /openspec:proposal — scaffolded the change: proposal.md · design.md · tasks.md.

RDreview the tech spec

Validated the design & tasks before any code — the gate that catches problems early.

Agentopenspec apply

Ran /openspec:apply — implemented the tasks, test-first, on each surface.

RDreview the code

Five-axis code review — correctness, readability, architecture, security, performance — before merge.

Outputshipped

Merged, live — on every surface. Spec → code, traceable.

the contract · OpenAPI · api-ssot

POST /sites/{siteId}/floor-plans GET /floor-plans/{id} PATCH DELETE /floor-plans/{id} POST /floor-plans/{id}/upload FloorPlan: id · siteId · name · imageUrl

the technical spec · openspec · floor-plan-management

### Requirement: Floor Plan CRUD #### Scenario: Create new floor plan #### Scenario: Invalid floor plan name (pattern) #### Scenario: Delete floor plan

In practice · Stage 04

Build (2)

Floor Plan · shipped

The same OpenAPI contract — but each surface does a different job, all built test-first and reviewed before merge.

Backend

vortex-backend · Go

Owns the data & contract

floor-plan CRUD · device positions · FOV, persisted

floor_plan_controller.go

floor_plan_repository.go

Web

app-vsaas-portal · Vue 3

The editor

drag devices onto the plan, draw each camera's FOV on a canvas

FloorPlanCanvas.vue · renderers/

FloorPlanCanvas.test.js

iOS

ioscharmander · Swift

The viewer

fetch & render the plan, tap a camera marker → live view

FloorPlanManager.swift

FloorPlanManagerTest.swift

Android

AndroidCharmander · Kotlin

The viewer

plan + camera-marker overlay, search across floor plans

FloorPlanViewerScreen.kt · CameraMarkerOverlay.kt

FloorPlanViewerViewModelTest.kt

Team Workflow · Stage 05

Fix in production

RD+coding agent

AgentMonitor site & logsCI runner / monitor

QA · humanVerify & reproducemanual + regression

AgentTriage the issuelog / issue triage

HumanJudge & decide the fixinvestigation

OutputFix shippedvia the spec · post-mortem

logserror tracking

the specstest cases

ReferencesPostmortem culture · Google SRE · AI Fluency — Diligence

AI Fluency

DelegationDescriptionDiscernmentDiligence

In practice · Stage 05

Fix in production (1)

Floor Plan · shipped

used Redmine · systematic-debugging — trace the symptom back to the spec

Liveshipped

Floor Plan running in production on Portal, iOS and Android.

QAreproduced

Moved a device to another Site — the camera count diverged: Portal 2 · iOS 2 · Android 1.Redmine #59610

Teamdiagnose

Not a rendering bug on any one platform — the spec never defined what happens when a device moves to another Site.

Team ⇆ Agentfix the spec

Three symptom tickets closed Won't Fix — define the behavior in the spec, not three local patches, then re-walk the flow.

Outputone definition

One spec change — every platform reads the same rule. Fix the spec, not the symptom.

Redmine #59610 · tracker: Spec

[Spec][Floor Plan] Spec doesn't define device “move to other sites” behaviour move 1 of 2 cameras to another Site → Portal 2 iOS 2 Android 1

one gap → three symptom tickets

#59495 Android — count differs Won't Fix #59533 Android/iOS — Site count Won't Fix #59727 Portal — camera wrong Won't Fix patch each = 3 patches that drift again. fix the spec = one source of truth.

In practice · Stage 05

Fix in production (2)

Floor Plan · shipped

used Redmine · openspec — trace a cluster of bugs back to one component spec

Liveshipped

The Floor Plan left-hand navigation tree, running in production.

QAsymptoms

A run of tree tickets — search broken, wrong device counts, new sites not listed, hierarchy inconsistent.Redmine #59730 · #59725 · #59726 · #59627 …

Teamdiagnose

Not seven separate bugs — one broken tree component: clicking a node expanded instead of selecting; a fixed shallow depth when the data nests far deeper; every feature reinvented its own tree.

Team ⇆ Agentrespec & re-walk

Instead of seven patches — a new openspec change for a generic tree component, then back through breakdown and build from the top.

Outputone component

A shared tree-view component — floorplan consumes it now; device list and views are next. Fix the spec, not the symptom.

TreeView · the rebuilt component · 7 levels

the spec change · openspec · add-tree-view-component

NEW tree-view-component spec click label = select, chevron = expand behavior by node kind, not depth search: auto-expand + highlight up to 7 levels · mixed siblings

the code change · app-vsaas-portal

NEW components/TreeView/TreeView.vue one component, node-kind driven floorplan/…/FloorPlanTreeSidebar.vue now consumes the shared TreeView replaced the per-feature trees

In practice · Beyond the five stages

Beyond the flow (1)

Dev harness · shipped

used superpowers:brainstorming · writing-skills — the skill that test-drives the skills it writes

BE devthe itch

“The front end has no throwaway URL to validate a branch against.” Not a feature — just friction.

FE ⇆ Agentbrainstorm

Used superpowers:brainstorming to design a per-branch preview harness — a design spec, not code first.

FE ⇆ Agentwriting-skills

Turned the design into a reusable skill — a /harness-deploy command + provisioning scripts the whole team runs.

Agentself-verify

writing-skills is TDD for skills — it pressure-tests the new skill with subagents, confirms it triggers and complies, and closes the loopholes before anyone relies on it.

UATspec amended

The target app rejected manual deploys (git-connected Amplify) — retargeted mid-build to oblivionis-preview. The spec moved, like any other.

Outputa new skill

pnpm harness:deploy <slug> → a per-branch HTTPS preview URL. The team taught the harness a new trick.

the design · superpowers:brainstorming

specs/2026-04-21-portal-harness-preview-design.md pnpm harness:deploy <slug> → per-branch HTTPS URL pnpm harness:destroy <slug> → clean up guard rails: only harness- slugs, never protected branches

the skill · writing-skills

/harness-deploy /harness-destroy scripts/provision-harness.sh scripts/deploy-harness.sh · destroy-harness.sh agent contract: ::preview-url::<url> pressure-tested with subagents before ship

In practice · Beyond the five stages

Beyond the flow (2)

extract-behavior · in the skill library

used writing-skills — capture the method once, every agent runs the same rigor

Patternnoticed

Every brownfield domain hits the same wall — “the code is the spec” — and someone re-derives it by hand each time.

RD ⇆ Agentdesign the method

Rather than repeat the work, designed one repeatable method: scope → discover → draft → audit → review → register.

RD ⇆ Agentthe hard part

Baked in a leak audit so specs stay behavioural, not code — MQTT → “streaming”, S3 → “the system stores”, JWT → “token”.

RD ⇆ Agentcapture as a skill

Wrote it up as extract-behavior — the method + OpenSpec format — so any agent runs the same rigor, not just its author.

Outputreusable

A team skill — applied since to Permission (4 scattered sources → one spec) and every new brownfield domain.

the design · extract-behavior/SKILL.md

workflow scope → discover → draft → audit → review rule one capability = one spec.md discover parallel backend + frontend agents audit strip impl detail — keep behaviour

proof · applied to Permission

4 sources: Excel · Casbin · Vue · iOS → openspec/specs/permission/spec.md source: extracted+authored one skill → every brownfield domain after

In the Loop · Role 01

RD — owns the spec, delegates the build.

Before, RD’s job ended at merge — now it never really does.

Research & Development

The shift — RD moved up a loop

Outer loopCI/CD, deploy & ops · days–weeks

Middle loop RD nowSupervise agents — decompose work, calibrate trust, catch plausible-but-wrong output, keep architecture coherent · hours–days

Inner loopWrite, test, debug · minutes–hours

Inner / middle / outer loops · ThoughtWorks — Future of Software Engineering

Skills RD drives

brainstorming openspec api-and-interface-review TDD code-reviewer harness-deploy dogfood smoke-test

Collaboration loops — not hand-off-and-forget

⇆Agent

Delegate the build; review the technical spec and the diff — the agent proposes, RD stays accountable.

⇆PM

Co-shape the high-level spec — feasibility, cost, and risk flow back before scope locks.

⇆UX

Build to the UX spec and prototype; the prototype ⇆ API loop keeps design and contract in sync.

⇆QA

Ship with tests already written; triage production issues together, then patch the spec.

In the Loop · Role 02

PM — owns intent, keeps the spec alive.

Before, PM wrote a doc and moved on — now the spec is a living contract.

Product

The shift

Own the high-level spec — the WHAT, not the HOW.
Turn vague ideas into an agreed, testable spec.
Keep the spec true as reality changes.

Skills PM drives

brainstorming openspec adversarial-review

Collaboration loops — not hand-off-and-forget

⇆Agent

Refine intent into a structured spec with the agent; validate every draft, then iterate.

⇆RD

Hand over intent and scope; get feasibility, cost, and risk back before committing.

⇆UX

Align on user flows early — the UX spec grows from shared intent.

⇆QA

Write acceptance criteria into the spec so QA can verify against it.

In the Loop · Role 03

QA — verifies against the spec, from day one.

Before, QA arrived at the end — now QA is in from the spec.

Quality Assurance

The shift

Verify against the spec — not ad-hoc clicking.
Write test cases from the spec, early.
Watch production with the agent; loop bugs back to the spec.

Skills QA drives

brainstorming generate-test-cases smoke-test

Collaboration loops — not hand-off-and-forget

⇆Agent

The agent monitors site and logs; QA verifies and reproduces — findings converge to triage.

⇆PM

Turn acceptance criteria in the spec into concrete, verifiable cases.

⇆RD

Tests are written before hand-off; triage production issues together.

⇆UX

Verify the build against the UX spec and prototype — not a screenshot.

In the Loop · Role 04

UX — ships a spec others can build and verify.

Before, UX threw a mockup over the wall — now the prototype is a living contract.

Experience Design

The shift

Write the UX spec and build an HTML prototype.
Iterate on headless components — prototype ⇆ API.
Hand off a spec QA can verify against.

Skills UX drives

brainstorming openspec prototype generate-flow-spec

Collaboration loops — not hand-off-and-forget

⇆Agent

Draft the UX spec and generate the prototype with the agent on our headless components.

⇆PM

Translate intent into flows — the UX spec grows from the high-level spec.

⇆RD

The prototype ⇆ API loop keeps design and contract honest as both evolve.

⇆QA

Deliver a verifiable UX spec into references/ so QA can check against it.

Context Engineering

Engineering the agent's context.

Fill the agent's window with the right information — durable knowledge it always needs, plus what's pulled in just-in-time as it works.

Durable knowledge

Project backgroundWhat & why

Tech stackLanguages, frameworks, infra

Conventions & principlesStyle, patterns, architectural rules

Architecture & decisionsBoundaries, ADRs, the why

Constraints & NFRsPerformance, security, compliance

Specs & contractsbehavioral · API spec · technical ↳ slides 3–5

LoreHistory, domain terms, tribal knowledge

Misc knowledgeGotchas, edge cases, references

Pulled in at runtime

The codebaseFiles by path / grep

Tools & integrationsMCP · Jira · Redmine · CI

TestsExpected behavior

MemoryCross-session state

ExamplesFew-shot, reference implementations

Agent skillsReusable procedures the agent invokes

In progress

What We're Building

Closing the loop, end to end.

Detect — the loop runs itself

01 CI runner Triggers the loop on every change

→

02 Agent Picks up and does the work

→

03 Monitor Watches the live site & logs

→

04 Generate issues Files what it finds for triage

Resolve — a human decides every change

05 Monitor issues Jira · Redmine · GitHub

→

06 Triage Cluster, dedupe, prioritize

→

07 Propose change Agent drafts the fix

spec changecode changewontfix

→

08 Review Human approves before it lands

Working With Agents

Mindset: AI Fluency

These run underneath every stage above — the human skills of Anthropic’s AI Fluency framework, in our own words.

01 Delegation Decide what to hand the agent — and what stays human.

In our practiceHumans own intent and the spec; agents draft, derive, and implement.

02 Description Say what you want clearly enough to build from.

In our practiceThe spec is the description — high-level intent, API contract, UX spec.

03 Discernment Judge the output, the process, and the behavior.

In our practiceCode review, api-and-interface review, and human validation gates.

04 Diligence Stay accountable — verify, trace, own the result.

In our practiceTests, spec provenance, and fixing the spec — not the symptom.

The Work

What we keep investing in.

01Build an agent-first environment and process.

02Design the context we feed the agents.

03Design the harness — the loop that runs the work.

04Keep fixing what blocks the flow — the harness of the harness.

05Judge and deliver with the AI Fluency 4D framework.

Questions & discussion

We build with agents — specs lead, we stay in control.

New First-Class Citizens
Multi-Level Artifacts

Beyond code — documents go first-class.

Top-down, traceable back.

What the artifacts actually look like.

Team
Workflow

From intent to implementation.

Clarify intent

Clarify intent

Write the spec

Write the spec

Break down

Break down (1)

Break down (2)

Build

Build (1)

Build (2)

Fix in production

Fix in production (1)

Fix in production (2)

Beyond the flow (1)

Beyond the flow (2)

Roles
in the Loop

RD — owns the spec, delegates the build.

PM — owns intent, keeps the spec alive.

QA — verifies against the spec, from day one.

UX — ships a spec others can build and verify.

Context
& Harness

Engineering the agent's context.

Closing the loop, end to end.

Mindset: AI Fluency

What we keep investing in.

We build with agents — specs lead, we stay in control.

New First-Class CitizensMulti-Level Artifacts

Beyond code — documents go first-class.

Top-down, traceable back.

What the artifacts actually look like.

TeamWorkflow

From intent to implementation.

Clarify intent

Clarify intent

Write the spec

Write the spec

Break down

Break down (1)

Break down (2)

Build

Build (1)

Build (2)

Fix in production

Fix in production (1)

Fix in production (2)

Beyond the flow (1)

Beyond the flow (2)

Rolesin the Loop

RD — owns the spec, delegates the build.

PM — owns intent, keeps the spec alive.

QA — verifies against the spec, from day one.

UX — ships a spec others can build and verify.

Context& Harness

Engineering the agent's context.

Closing the loop, end to end.

Mindset: AI Fluency

What we keep investing in.

New First-Class Citizens
Multi-Level Artifacts

Team
Workflow

Roles
in the Loop

Context
& Harness