Our Practice Engineering · 2026
Working With Agents

We build with agents — specs lead, we stay in control.

First-class specs, a spec-first flow, every role in the loop, and a harness that closes it end to end.
High-level spec API + UX spec Technical spec Code top-down · traceable
01
PART 01

New First-Class Citizens
Multi-Level Artifacts

The specs are the asset. Everything else is derived from them.
02
First-Class Citizens

Beyond code — documents go first-class.

Code was always the one artifact we version, review, and trust as the source of truth. Now these documents earn the same standing:

High-level specthe what & why
API specthe service contract
UX specthe interaction contract
Technical specthe how, per platform
03
Multi-level spec stack

Top-down, traceable back.

derives
traces back
00
Intent
The problem to solve — a bug, a request, a refactor
From Jira / Confluence / Redmine
01
High-level spec
openspec · proposal.md · design.md
What & why — behavior and requirements
02
API spec
OpenAPI schema
API contract — frontend and backend build to it
UX spec
markdown — screens, layouts, interactions, flows
+ HTML prototype
Interaction contract — screens and behavior
→ lives in references/
03
Technical specs (in openspec format)
Backend
Web
iOS
Android
Code & tests
Derived — and traceable back to every layer
04
Example · Site Management

What the artifacts actually look like.

One real feature in openspec — proposal, contracts, UX, and per-endpoint specs, all traceable.
High-level spec
openspec · site-management-for-portal/
proposal.md
specs/site-hierarchy/spec.md
specs/site-permissions/spec.md
specs/site-device-filtering/spec.md
specs/floor-plan-display/spec.md
specs/view-module/spec.md
API spec
OpenAPI · api-ssot
Site CRUD endpoints
Device ↔ site assignment
Permission scopes
the contract both ends build to
UX spec
artifacts/
ux-spec.md
prototype.html
filter-combo/
device-picker-dialog/
tree-view-list/
key-screens/ ×13
Technical spec
openspec · per screen / endpoint
spec.md
proposal.md
design.md
tasks.md
↓ code + tests
Filter Combo prototype
filter-combo/ · prototype.html
Device Picker prototype
device-picker-dialog/ · prototype.html
05
PART 02

Team
Workflow

Most stages pair a technique with the artifact it produces.
06
The Flow

From intent to implementation.

01 Clarify intent Humans align on the problem
02 Write the spec Finalize + PR / review
03 Break down API + UX spec, discuss
04 Build Per endpoint, implement
05 Fix in production Fix issues after go-live
04 fans out to Backend Web iOS Android
07
Team Workflow · Stage 01

Clarify intent

PO · PM · RD · UX · Architect·no AI
HumanPO presents the problem & contextframing
HumanTeam discusses to understand itfacilitation
HumanAgree scope: in / outscoping
OutputClarified intentagreed scope
project bglore
misc knowledge
constraints
Referencessuperpowers · AI Fluency — Delegation · Anthropic
AI Fluency
DelegationDescriptionDiscernmentDiligence
08
Team Workflow · Stage 02

Write the spec

PM · RD+coding agent
HumanHand off the refined intenthandoff
AgentFormalize into openspecopenspec workflow
AgentDraft proposal · design · specsextract-behavior
HumanValidate & PR reviewspec review
OutputHigh-level specproposal.md · design.md · specs/
clarified intent
conventionsproject bg
architecturethe codebase
conventions
ReferencesOpenSpec · AI Fluency — Description · Anthropic
AI Fluency
DelegationDescriptionDiscernmentDiligence
09
Team Workflow · Stage 03

Break down

RD · UX+coding agent
BaseHigh-level spec
RD · AgentDerive the API contractshape & validate / OpenAPI
UX · AgentDraft UX spec + HTML prototypereview the flow / headless UI
OutputContractsAPI spec · UX spec · HTML prototype
API design principlesexisting contracts
design referencescomponent library
AI Fluency
DelegationDescriptionDiscernmentDiligence
10
Team Workflow · Stage 04

Build

RD+coding agent
AgentGenerate per-endpoint technical specopenspec
HumanReview the technical specapi-and-interface-review
AgentImplement with testsTDD
HumanCode reviewcode-reviewer
OutputWorking codetechnical spec · code + tests
API + UX specthe codebase
API + UX specconventions
the codebaseteststech stack
conventions
AI Fluency
DelegationDescriptionDiscernmentDiligence
11
Team Workflow · Stage 05

Fix in production

RD · on-call+coding agent
AgentMonitor site & logsCI runner / monitor
QA · humanVerify & reproducemanual + regression
AgentTriage the issuelog / issue triage
HumanJudge & decide the fixinvestigation
OutputFix shippedvia the spec · post-mortem
logserror tracking
the specstest cases
ReferencesPostmortem culture · Google SRE · AI Fluency — Diligence
AI Fluency
DelegationDescriptionDiscernmentDiligence
12
PART 03

Roles
in the Loop

How each role works now — and stays in the loop with the others.
13
In the Loop · Role 01

RD — owns the spec, delegates the build.

Before, RD’s job ended at merge — now it never really does.
RD
Research & Development
The shift — RD moved up a loop
Outer loopCI/CD, deploy & ops · days–weeks
Middle loop RD nowSupervise agents — decompose work, calibrate trust, catch plausible-but-wrong output, keep architecture coherent · hours–days
Inner loopWrite, test, debug · minutes–hours
Inner / middle / outer loops · ThoughtWorks — Future of Software Engineering
Skills RD drives
brainstorming openspec api-and-interface-review TDD code-reviewer harness-deploy dogfood smoke-test
Collaboration loops — not hand-off-and-forget
Agent
Delegate the build; review the technical spec and the diff — the agent proposes, RD stays accountable.
PM
Co-shape the high-level spec — feasibility, cost, and risk flow back before scope locks.
UX
Build to the UX spec and prototype; the prototype ⇆ API loop keeps design and contract in sync.
QA
Ship with tests already written; triage production issues together, then patch the spec.
14
In the Loop · Role 02

PM — owns intent, keeps the spec alive.

Before, PM wrote a doc and moved on — now the spec is a living contract.
PM
Product
The shift
  • Own the high-level spec — the WHAT, not the HOW.
  • Turn vague ideas into an agreed, testable spec.
  • Keep the spec true as reality changes.
Skills PM drives
brainstorming openspec adversarial-review
Collaboration loops — not hand-off-and-forget
Agent
Refine intent into a structured spec with the agent; validate every draft, then iterate.
RD
Hand over intent and scope; get feasibility, cost, and risk back before committing.
UX
Align on user flows early — the UX spec grows from shared intent.
QA
Write acceptance criteria into the spec so QA can verify against it.
15
In the Loop · Role 03

QA — verifies against the spec, from day one.

Before, QA arrived at the end — now QA is in from the spec.
QA
Quality Assurance
The shift
  • Verify against the spec — not ad-hoc clicking.
  • Write test cases from the spec, early.
  • Watch production with the agent; loop bugs back to the spec.
Skills QA drives
brainstorming generate-test-cases smoke-test
Collaboration loops — not hand-off-and-forget
Agent
The agent monitors site and logs; QA verifies and reproduces — findings converge to triage.
PM
Turn acceptance criteria in the spec into concrete, verifiable cases.
RD
Tests are written before hand-off; triage production issues together.
UX
Verify the build against the UX spec and prototype — not a screenshot.
16
In the Loop · Role 04

UX — ships a spec others can build and verify.

Before, UX threw a mockup over the wall — now the prototype is a living contract.
UX
Experience Design
The shift
  • Write the UX spec and build an HTML prototype.
  • Iterate on headless components — prototype ⇆ API.
  • Hand off a spec QA can verify against.
Skills UX drives
brainstorming openspec prototype generate-flow-spec
Collaboration loops — not hand-off-and-forget
Agent
Draft the UX spec and generate the prototype with the agent on our headless components.
PM
Translate intent into flows — the UX spec grows from the high-level spec.
RD
The prototype ⇆ API loop keeps design and contract honest as both evolve.
QA
Deliver a verifiable UX spec into references/ so QA can check against it.
17
PART 04

Context
& Harness

What we feed the agents — and the loop we're automating.
18
Context Engineering

Engineering the agent's context.

Fill the agent's window with the right information — durable knowledge it always needs, plus what's pulled in just-in-time as it works.

Durable knowledge
Project backgroundWhat & why
Tech stackLanguages, frameworks, infra
Conventions & principlesStyle, patterns, architectural rules
Architecture & decisionsBoundaries, ADRs, the why
Constraints & NFRsPerformance, security, compliance
Specs & contractsbehavioral · API spec · technical  ↳ slides 3–5
LoreHistory, domain terms, tribal knowledge
Misc knowledgeGotchas, edge cases, references
Pulled in at runtime
The codebaseFiles by path / grep
Tools & integrationsMCP · Jira · Redmine · CI
TestsExpected behavior
MemoryCross-session state
ExamplesFew-shot, reference implementations
Agent skillsReusable procedures the agent invokes
19
In progress
What We're Building

Closing the loop, end to end.

Detect — the loop runs itself
01 CI runner Triggers the loop on every change
02 Agent Picks up and does the work
03 Monitor Watches the live site & logs
04 Generate issues Files what it finds for triage
Resolve — a human decides every change
05 Monitor issues Jira · Redmine · GitHub
06 Triage Cluster, dedupe, prioritize
07 Propose change Agent drafts the fix
spec changecode changewontfix
08 Review Human approves before it lands
20
Working With Agents

Mindset: AI Fluency

These run underneath every stage above — the human skills of Anthropic’s AI Fluency framework, in our own words.
01 Delegation Decide what to hand the agent — and what stays human.
In our practiceHumans own intent and the spec; agents draft, derive, and implement.
02 Description Say what you want clearly enough to build from.
In our practiceThe spec is the description — high-level intent, API contract, UX spec.
03 Discernment Judge the output, the process, and the behavior.
In our practiceCode review, api-and-interface review, and human validation gates.
04 Diligence Stay accountable — verify, trace, own the result.
In our practiceTests, spec provenance, and fixing the spec — not the symptom.
21
The Work

What we keep investing in.

01Build an agent-first environment and process.
02Design the context we feed the agents.
03Design the harness — the loop that runs the work.
04Keep fixing what blocks the flow — the harness of the harness.
05Judge and deliver with the AI Fluency 4D framework.
Questions & discussion
22