Initial commit

2026-05-16 13:43:29 +07:00
commit 688a86b611
208 changed files with 44350 additions and 0 deletions
--- a/.claude/skills/incremental-implementation/SKILL.md
+++ b/.claude/skills/incremental-implementation/SKILL.md
@@ -0,0 +1,245 @@
+---
+name: incremental-implementation
+description: Delivers changes incrementally. Use when implementing any feature or change that touches more than one file. Use when you're about to write a large amount of code at once, or when a task feels too big to land in one step.
+---
+
+# Incremental Implementation
+
+## Overview
+
+Build in thin vertical slices — implement one piece, test it, verify it, then expand. Avoid implementing an entire feature in one pass. Each increment should leave the system in a working, testable state. This is the execution discipline that makes large features manageable.
+
+## When to Use
+
+- Implementing any multi-file change
+- Building a new feature from a task breakdown
+- Refactoring existing code
+- Any time you're tempted to write more than ~100 lines before testing
+
+**When NOT to use:** Single-file, single-function changes where the scope is already minimal.
+
+## The Increment Cycle
+
+```
+┌──────────────────────────────────────┐
+│                                      │
+│   Implement ──→ Test ──→ Verify ──┐  │
+│       ▲                           │  │
+│       └───── Commit ◄─────────────┘  │
+│              │                       │
+│              ▼                       │
+│          Next slice                  │
+│                                      │
+└──────────────────────────────────────┘
+```
+
+For each slice:
+
+1. **Implement** the smallest complete piece of functionality
+2. **Test** — run the test suite (or write a test if none exists)
+3. **Verify** — confirm the slice works as expected (tests pass, build succeeds, manual check)
+4. **Commit** -- save your progress with a descriptive message (see `git-workflow-and-versioning` for atomic commit guidance)
+5. **Move to the next slice** — carry forward, don't restart
+
+## Slicing Strategies
+
+### Vertical Slices (Preferred)
+
+Build one complete path through the stack:
+
+```
+Slice 1: Create a task (DB + API + basic UI)
+    → Tests pass, user can create a task via the UI
+
+Slice 2: List tasks (query + API + UI)
+    → Tests pass, user can see their tasks
+
+Slice 3: Edit a task (update + API + UI)
+    → Tests pass, user can modify tasks
+
+Slice 4: Delete a task (delete + API + UI + confirmation)
+    → Tests pass, full CRUD complete
+```
+
+Each slice delivers working end-to-end functionality.
+
+### Contract-First Slicing
+
+When backend and frontend need to develop in parallel:
+
+```
+Slice 0: Define the API contract (types, interfaces, OpenAPI spec)
+Slice 1a: Implement backend against the contract + API tests
+Slice 1b: Implement frontend against mock data matching the contract
+Slice 2: Integrate and test end-to-end
+```
+
+### Risk-First Slicing
+
+Tackle the riskiest or most uncertain piece first:
+
+```
+Slice 1: Prove the WebSocket connection works (highest risk)
+Slice 2: Build real-time task updates on the proven connection
+Slice 3: Add offline support and reconnection
+```
+
+If Slice 1 fails, you discover it before investing in Slices 2 and 3.
+
+## Implementation Rules
+
+### Rule 0: Simplicity First
+
+Before writing any code, ask: "What is the simplest thing that could work?"
+
+After writing code, review it against these checks:
+- Can this be done in fewer lines?
+- Are these abstractions earning their complexity?
+- Would a staff engineer look at this and say "why didn't you just..."?
+- Am I building for hypothetical future requirements, or the current task?
+
+```
+SIMPLICITY CHECK:
+✗ Generic EventBus with middleware pipeline for one notification
+✓ Simple function call
+
+✗ Abstract factory pattern for two similar components
+✓ Two straightforward components with shared utilities
+
+✗ Config-driven form builder for three forms
+✓ Three form components
+```
+
+Three similar lines of code is better than a premature abstraction. Implement the naive, obviously-correct version first. Optimize only after correctness is proven with tests.
+
+### Rule 0.5: Scope Discipline
+
+Touch only what the task requires.
+
+Do NOT:
+- "Clean up" code adjacent to your change
+- Refactor imports in files you're not modifying
+- Remove comments you don't fully understand
+- Add features not in the spec because they "seem useful"
+- Modernize syntax in files you're only reading
+
+If you notice something worth improving outside your task scope, note it — don't fix it:
+
+```
+NOTICED BUT NOT TOUCHING:
+- src/utils/format.ts has an unused import (unrelated to this task)
+- The auth middleware could use better error messages (separate task)
+→ Want me to create tasks for these?
+```
+
+### Rule 1: One Thing at a Time
+
+Each increment changes one logical thing. Don't mix concerns:
+
+**Bad:** One commit that adds a new component, refactors an existing one, and updates the build config.
+
+**Good:** Three separate commits — one for each change.
+
+### Rule 2: Keep It Compilable
+
+After each increment, the project must build and existing tests must pass. Don't leave the codebase in a broken state between slices.
+
+### Rule 3: Feature Flags for Incomplete Features
+
+If a feature isn't ready for users but you need to merge increments:
+
+```typescript
+// Feature flag for work-in-progress
+const ENABLE_TASK_SHARING = process.env.FEATURE_TASK_SHARING === 'true';
+
+if (ENABLE_TASK_SHARING) {
+  // New sharing UI
+}
+```
+
+This lets you merge small increments to the main branch without exposing incomplete work.
+
+### Rule 4: Safe Defaults
+
+New code should default to safe, conservative behavior:
+
+```typescript
+// Safe: disabled by default, opt-in
+export function createTask(data: TaskInput, options?: { notify?: boolean }) {
+  const shouldNotify = options?.notify ?? false;
+  // ...
+}
+```
+
+### Rule 5: Rollback-Friendly
+
+Each increment should be independently revertable:
+
+- Additive changes (new files, new functions) are easy to revert
+- Modifications to existing code should be minimal and focused
+- Database migrations should have corresponding rollback migrations
+- Avoid deleting something in one commit and replacing it in the same commit — separate them
+
+## Working with Agents
+
+When directing an agent to implement incrementally:
+
+```
+"Let's implement Task 3 from the plan.
+
+Start with just the database schema change and the API endpoint.
+Don't touch the UI yet — we'll do that in the next increment.
+
+After implementing, run `npm test` and `npm run build` to verify
+nothing is broken."
+```
+
+Be explicit about what's in scope and what's NOT in scope for each increment.
+
+## Increment Checklist
+
+After each increment, verify:
+
+- [ ] The change does one thing and does it completely
+- [ ] All existing tests still pass (`npm test`)
+- [ ] The build succeeds (`npm run build`)
+- [ ] Type checking passes (`npx tsc --noEmit`)
+- [ ] Linting passes (`npm run lint`)
+- [ ] The new functionality works as expected
+- [ ] The change is committed with a descriptive message
+
+**Note:** Run each verification command after a change that could affect it. After a successful run, don't repeat the same command unless the code has changed since — re-running on unchanged code adds no information.
+
+## Common Rationalizations
+
+| Rationalization | Reality |
+|---|---|
+| "I'll test it all at the end" | Bugs compound. A bug in Slice 1 makes Slices 2-5 wrong. Test each slice. |
+| "It's faster to do it all at once" | It *feels* faster until something breaks and you can't find which of 500 changed lines caused it. |
+| "These changes are too small to commit separately" | Small commits are free. Large commits hide bugs and make rollbacks painful. |
+| "I'll add the feature flag later" | If the feature isn't complete, it shouldn't be user-visible. Add the flag now. |
+| "This refactor is small enough to include" | Refactors mixed with features make both harder to review and debug. Separate them. |
+| "Let me run the build command again just to be sure" | After a successful run, repeating the same command adds nothing unless the code has changed since. Run it again after subsequent edits, not as reassurance. |
+
+## Red Flags
+
+- More than 100 lines of code written without running tests
+- Multiple unrelated changes in a single increment
+- "Let me just quickly add this too" scope expansion
+- Skipping the test/verify step to move faster
+- Build or tests broken between increments
+- Large uncommitted changes accumulating
+- Building abstractions before the third use case demands it
+- Touching files outside the task scope "while I'm here"
+- Creating new utility files for one-time operations
+- Running the same build/test command twice in a row without any intervening code change
+
+## Verification
+
+After completing all increments for a task:
+
+- [ ] Each increment was individually tested and committed
+- [ ] The full test suite passes
+- [ ] The build is clean
+- [ ] The feature works end-to-end as specified
+- [ ] No uncommitted changes remain
--- a/.claude/skills/incremental-implementation/references/accessibility-checklist.md
+++ b/.claude/skills/incremental-implementation/references/accessibility-checklist.md
@@ -0,0 +1,160 @@
+# Accessibility Checklist
+
+Quick reference for WCAG 2.1 AA compliance. Use alongside the `frontend-ui-engineering` skill.
+
+## Table of Contents
+
+- [Essential Checks](#essential-checks)
+- [Common HTML Patterns](#common-html-patterns)
+- [Testing Tools](#testing-tools)
+- [Quick Reference: ARIA Live Regions](#quick-reference-aria-live-regions)
+- [Common Anti-Patterns](#common-anti-patterns)
+
+## Essential Checks
+
+### Keyboard Navigation
+- [ ] All interactive elements focusable via Tab key
+- [ ] Focus order follows visual/logical order
+- [ ] Focus is visible (outline/ring on focused elements)
+- [ ] Custom widgets have keyboard support (Enter to activate, Escape to close)
+- [ ] No keyboard traps (user can always Tab away from a component)
+- [ ] Skip-to-content link at top of page - visible (at least) on keyboard focus
+- [ ] Modals trap focus while open, return focus on close
+
+### Screen Readers
+- [ ] All images have `alt` text (or `alt=""` for decorative images)
+- [ ] All form inputs have associated labels (`<label>` or `aria-label`)
+- [ ] Buttons and links have descriptive text (not "Click here")
+- [ ] Icon-only buttons have `aria-label`
+- [ ] Page has one `<h1>` and headings don't skip levels
+- [ ] Dynamic content changes announced (`aria-live` regions)
+- [ ] Tables have `<th>` headers with scope
+
+### Visual
+- [ ] Text contrast ≥ 4.5:1 (normal text) or ≥ 3:1 (large text, 18px+)
+- [ ] UI components contrast ≥ 3:1 against background
+- [ ] Color is not the only way to convey information
+- [ ] Text resizable to 200% without breaking layout
+- [ ] No content that flashes more than 3 times per second
+
+### Forms
+- [ ] Every input has a visible label
+- [ ] Required fields indicated (not by color alone)
+- [ ] Error messages specific and associated with the field
+- [ ] Error state visible by more than color (icon, text, border)
+- [ ] Form submission errors summarized and focusable
+- [ ] Known fields use autocomplete (for example `type="email" autocomplete="email"`)
+
+### Content
+- [ ] Language declared (`<html lang="en">`)
+- [ ] Page has a descriptive `<title>`
+- [ ] Links distinguish from surrounding text (not by color alone)
+- [ ] Touch targets ≥ 44x44px on mobile
+- [ ] Meaningful empty states (not blank screens)
+
+## Common HTML Patterns
+
+### Buttons vs. Links
+
+```html
+<!-- Use <button> for actions -->
+<button onClick={handleDelete}>Delete Task</button>
+
+<!-- Use <a> for navigation -->
+<a href="/tasks/123">View Task</a>
+
+<!-- NEVER use div/span as buttons -->
+<div onClick={handleDelete}>Delete</div>  <!-- BAD -->
+```
+
+### Form Labels
+
+```html
+<!-- Explicit label association -->
+<label htmlFor="email">Email address</label>
+<input id="email" type="email" required />
+
+<!-- Implicit wrapping -->
+<label>
+  Email address
+  <input type="email" required />
+</label>
+
+<!-- Hidden label (visible label preferred) -->
+<input type="search" aria-label="Search tasks" />
+```
+
+### ARIA Roles
+
+```html
+<!-- Navigation -->
+<nav aria-label="Main navigation">...</nav>
+<nav aria-label="Footer links">...</nav>
+
+<!-- Status messages -->
+<div role="status" aria-live="polite">Task saved</div>
+
+<!-- Alert messages -->
+<div role="alert">Error: Title is required</div>
+
+<!-- Modal dialogs -->
+<dialog aria-modal="true" aria-labelledby="dialog-title">
+  <h2 id="dialog-title">Confirm Delete</h2>
+  ...
+</dialog>
+
+<!-- Loading states -->
+<div aria-busy="true" aria-label="Loading tasks">
+  <Spinner />
+</div>
+```
+
+### Accessible Lists
+
+```html
+<ul role="list" aria-label="Tasks">
+  <li>
+    <input type="checkbox" id="task-1" aria-label="Complete: Buy groceries" />
+    <label htmlFor="task-1">Buy groceries</label>
+  </li>
+</ul>
+```
+
+## Testing Tools
+
+```bash
+# Automated audit
+npx axe-core          # Programmatic accessibility testing
+npx pa11y             # CLI accessibility checker
+
+# In browser
+# Chrome DevTools → Lighthouse → Accessibility
+# Chrome DevTools → Elements → Accessibility tree
+
+# Screen reader testing
+# macOS: VoiceOver (Cmd + F5)
+# Windows: NVDA (free) or JAWS
+# Linux: Orca
+```
+
+## Quick Reference: ARIA Live Regions
+
+| Value | Behavior | Use For |
+|-------|----------|---------|
+| `aria-live="polite"` | Announced at next pause | Status updates, saved confirmations |
+| `aria-live="assertive"` | Announced immediately | Errors, time-sensitive alerts |
+| `role="status"` | Same as `polite` | Status messages |
+| `role="alert"` | Same as `assertive` | Error messages |
+
+## Common Anti-Patterns
+
+| Anti-Pattern | Problem | Fix |
+|---|---|---|
+| `div` as button | Not focusable, no keyboard support | Use `<button>` |
+| Missing `alt` text | Images invisible to screen readers | Add descriptive `alt` |
+| Color-only states | Invisible to color-blind users | Add icons, text, or patterns |
+| Autoplaying media | Disorienting, can't be stopped | Add controls, don't autoplay |
+| Custom dropdown with no ARIA | Unusable by keyboard/screen reader | Use native `<select>` or proper ARIA listbox |
+| Removing focus outlines | Users can't see where they are | Style outlines, don't remove them |
+| Empty links/buttons | "Link" announced with no description | Add text or `aria-label` |
+| `tabindex > 0` | Breaks natural tab order | Use `tabindex="0"` or `-1` only |
--- a/.claude/skills/incremental-implementation/references/orchestration-patterns.md
+++ b/.claude/skills/incremental-implementation/references/orchestration-patterns.md
@@ -0,0 +1,370 @@
+# Orchestration Patterns
+
+Reference catalog of agent orchestration patterns this repo endorses, plus anti-patterns to avoid. Read this before adding a new slash command that coordinates multiple personas, or before introducing a new persona that "wraps" existing ones.
+
+The governing rule: **the user (or a slash command) is the orchestrator. Personas do not invoke other personas.** Skills are mandatory hops inside a persona's workflow.
+
+---
+
+## Endorsed patterns
+
+### 1. Direct invocation (no orchestration)
+
+Single persona, single perspective, single artifact. The default and the cheapest option.
+
+```
+user → code-reviewer → report → user
+```
+
+**Use when:** the work is one perspective on one artifact and you can describe it in one sentence.
+
+**Examples:**
+- "Review this PR" → `code-reviewer`
+- "Find security issues in `auth.ts`" → `security-auditor`
+- "What tests are missing for the checkout flow?" → `test-engineer`
+
+**Cost:** one round trip. The baseline you should always compare orchestrated patterns against.
+
+---
+
+### 2. Single-persona slash command
+
+A slash command that wraps one persona with the project's skills. Saves the user from re-explaining the workflow every time.
+
+```
+/review → code-reviewer (with code-review-and-quality skill) → report
+```
+
+**Use when:** the same single-persona invocation happens repeatedly with the same setup.
+
+**Examples in this repo:** `/review`, `/test`, `/code-simplify`.
+
+**Cost:** same as direct invocation. The slash command is just a saved prompt.
+
+**Anti-signal:** if the slash command's body is mostly "decide which persona to call," delete it and let the user call the persona directly.
+
+---
+
+### 3. Parallel fan-out with merge
+
+Multiple personas operate on the same input concurrently, each producing an independent report. A merge step (in the main agent's context) synthesizes them into a single decision.
+
+```
+                    ┌─→ code-reviewer    ─┐
+/ship → fan out  ───┼─→ security-auditor ─┤→ merge → go/no-go + rollback
+                    └─→ test-engineer    ─┘
+```
+
+**Use when:**
+- The sub-tasks are genuinely independent (no shared mutable state, no ordering dependency)
+- Each sub-agent benefits from its own context window
+- The merge step is small enough to stay in the main context
+- Wall-clock latency matters
+
+**Examples in this repo:** `/ship`.
+
+**Cost:** N parallel sub-agent contexts + one merge turn. Higher than direct invocation, but faster wall-clock and produces better reports because each sub-agent stays focused on its single perspective.
+
+**Validation checklist before adopting this pattern:**
+- [ ] Can I run all sub-agents at the same time without ordering issues?
+- [ ] Does each persona produce a different *kind* of finding, not just the same finding from a different angle?
+- [ ] Will the merge step fit in the main agent's remaining context?
+- [ ] Is the user's wait time long enough that parallelism is actually noticeable?
+
+If any answer is "no," fall back to direct invocation or a single-persona command.
+
+---
+
+### 4. Sequential pipeline as user-driven slash commands
+
+The user runs slash commands in a defined order, carrying context (or commit history) between them. There is no orchestrator agent — the user IS the orchestrator.
+
+```
+user runs:  /spec  →  /plan  →  /build  →  /test  →  /review  →  /ship
+```
+
+**Use when:** the workflow has dependencies (each step needs the previous step's output) and human judgment between steps adds value.
+
+**Examples in this repo:** the entire DEFINE → PLAN → BUILD → VERIFY → REVIEW → SHIP lifecycle.
+
+**Cost:** one sub-agent context per step. Free for the orchestration layer because there is no orchestrator agent.
+
+**Why not automate it:** an LLM "lifecycle orchestrator" would (a) lose nuance between steps because it has to summarize for hand-off, (b) skip the human checkpoints that catch wrong-direction work early, and (c) double the token cost via paraphrasing turns.
+
+---
+
+### 5. Research isolation (context preservation)
+
+When a task requires reading large amounts of material that shouldn't pollute the main context, spawn a research sub-agent that returns only a digest.
+
+```
+main agent → research sub-agent (reads 50 files) → digest → main agent continues
+```
+
+**Use when:**
+- The main session needs to stay focused on a downstream task
+- The investigation result is much smaller than the input it consumes
+- The decision quality benefits from the main agent having room to think after
+
+**Examples:** "Find every call site of this deprecated API across the monorepo," "Summarize what these 30 ADRs say about caching."
+
+**Cost:** one isolated sub-agent context. Worth it any time the alternative is loading hundreds of files into the main context.
+
+**On Claude Code, use the built-in `Explore` subagent** rather than defining a custom research persona. `Explore` runs on Haiku, is denied write/edit tools, and is purpose-built for this pattern. Define a custom research subagent only when `Explore` doesn't fit (e.g. you need a domain-specific system prompt the model wouldn't infer).
+
+---
+
+## Claude Code compatibility
+
+This catalog is harness-agnostic, but most readers will run it on Claude Code. Here's how each pattern maps onto Claude Code's primitives — and where the platform enforces our rules for us.
+
+### Where personas live
+
+Plugin subagents go in `agents/` at the plugin root. This repo is a plugin (`.claude-plugin/plugin.json`), so `agents/code-reviewer.md`, `agents/security-auditor.md`, and `agents/test-engineer.md` are auto-discovered when the plugin is enabled. No path configuration needed.
+
+### Subagents vs. Agent Teams
+
+Claude Code has two parallelism primitives. Pattern 3 (parallel fan-out with merge) maps to **subagents**. If you need teammates that talk to each other, use **Agent Teams** instead.
+
+| | Subagents | Agent Teams |
+|--|-----------|-------------|
+| Coordination | Main agent fans out, sub-agents only report back | Teammates message each other, share a task list |
+| Context | Own context window per subagent | Own context window per teammate |
+| When to use | Independent tasks producing reports | Collaborative work needing discussion |
+| Status | Stable | Experimental — requires `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` |
+| Cost | Lower | Higher — each teammate is a separate Claude instance |
+
+**The personas in this repo work in both modes.** When spawned as subagents (e.g. by `/ship`), they report findings to the main session. When spawned as teammates (`Spawn a teammate using the security-auditor agent type…`), they can challenge each other's findings directly. The persona definition is the same; only the spawning context changes.
+
+One subtlety: the `skills` and `mcpServers` frontmatter fields in a persona are honored when it runs as a subagent but **ignored when it runs as a teammate** — teammates load skills and MCP servers from your project and user settings, the same as a regular session. If a persona depends on a specific skill or MCP server being loaded, configure it at the session level so it's available in both modes.
+
+### Platform-enforced rules
+
+Two rules in this catalog aren't just convention — Claude Code enforces them:
+
+- **"Subagents cannot spawn other subagents"** (verbatim from the docs). Anti-pattern B (persona-calls-persona) and Anti-pattern D (deep persona trees) cannot exist on Claude Code by construction.
+- **"No nested teams"** — teammates cannot spawn their own teams. Same anti-patterns blocked at the team level.
+
+This means you can adopt the patterns in this catalog without worrying about contributors accidentally building the anti-patterns. They'll just fail to load.
+
+### Built-in subagents to know about
+
+Before defining a custom subagent, check whether one of these covers the role:
+
+| Built-in | Purpose |
+|----------|---------|
+| `Explore` | Read-only codebase search and analysis. Use this for Pattern 5 (research isolation). |
+| `Plan` | Read-only research during plan mode. |
+| `general-purpose` | Multi-step tasks needing both exploration and modification. |
+
+Don't redefine these. Layer your specialist personas (code-reviewer, security-auditor, test-engineer) on top of them.
+
+### Frontmatter restrictions for plugin agents
+
+Plugin subagents do **not** support the `hooks`, `mcpServers`, or `permissionMode` frontmatter fields — these are silently ignored. If a future persona needs any of those, the user must copy the file into `.claude/agents/` or `~/.claude/agents/` instead.
+
+The fields that DO work in plugin agents are: `name`, `description`, `tools`, `disallowedTools`, `model`, `maxTurns`, `skills`, `memory`, `background`, `effort`, `isolation`, `color`, `initialPrompt`. Use `model` per-persona if you want to optimize cost (e.g. Haiku for `test-engineer` coverage scans, Sonnet for `code-reviewer`, Opus for `security-auditor`).
+
+### Spawning multiple subagents in parallel
+
+In Claude Code, parallel fan-out (Pattern 3) requires issuing **multiple Agent tool calls in a single assistant turn**. Sequential turns serialize execution. `/ship` calls this out explicitly. Any new orchestrator command should do the same.
+
+---
+
+## Worked example: Agent Teams for competing-hypothesis debugging
+
+This example shows when to reach for **Agent Teams** instead of `/ship`'s subagent fan-out. The two patterns look similar from a distance — both spawn the same three personas — but the value comes from a different place.
+
+### The scenario
+
+> *Checkout occasionally hangs for ~30 seconds before completing. It happens roughly once every 50 sessions. No errors in logs. Started after last week's release.*
+
+Plausible root causes (mutually exclusive, all fit the symptoms):
+
+1. A race condition in the new payment-confirmation flow
+2. An auth check that occasionally falls through to a slow synchronous network call
+3. A missing index on a query that scales with cart size
+4. A flaky third-party API where the SDK retries silently before timing out
+
+A single agent will pick the first plausible theory and stop investigating. A `/ship`-style subagent fan-out would have each persona report independently — but their reports never meet, so nothing rules out the wrong theories.
+
+This is exactly the case the Agent Teams docs describe: *"With multiple independent investigators actively trying to disprove each other, the theory that survives is much more likely to be the actual root cause."*
+
+### Why this is *not* a `/ship` job
+
+| | `/ship` (subagents) | Agent Teams |
+|--|--------------------|-------------|
+| Sub-agents see | The same diff, different lenses | A shared task list, each other's messages |
+| Output | Three independent reports → one merge | Adversarial debate → consensus root cause |
+| Right when | You want a verdict on a known artifact | You want to *find* the artifact among hypotheses |
+
+`/ship` is a verdict; Agent Teams is an investigation.
+
+### Setup (one-time, per-environment)
+
+Agent Teams is experimental. In `~/.claude/settings.json`:
+
+```json
+{
+  "env": {
+    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
+  }
+}
+```
+
+Requires Claude Code v2.1.32 or later. The personas in this repo are picked up automatically — no team-config files to author by hand.
+
+### The trigger prompt
+
+Type into the lead session, in natural language:
+
+```
+Users report checkout hangs for ~30 seconds intermittently after last
+week's release. No errors in logs.
+
+Create an agent team to debug this with competing hypotheses. Spawn
+three teammates using the existing agent types:
+
+  - code-reviewer  — investigate race conditions and blocking calls
+                     in the checkout code path
+  - security-auditor — investigate auth checks, session handling,
+                       and any synchronous network calls added recently
+  - test-engineer  — propose tests that would distinguish between the
+                     hypotheses and check coverage gaps in checkout
+
+Have them message each other directly to challenge each other's
+theories. Update findings as consensus emerges. Only converge when
+two teammates agree they can disprove the others'.
+```
+
+The lead spawns three teammates referencing the existing persona names. The persona body is **appended** to each teammate's system prompt as additional instructions (on top of the team-coordination instructions the lead installs); the trigger prompt above becomes their task.
+
+### What happens
+
+1. Each teammate runs in its own context window, exploring the codebase from its own lens.
+2. Teammates use `message` to send findings to each other directly. The lead doesn't have to relay.
+3. The shared task list shows who's investigating what — visible at any time with `Ctrl+T` (in-process mode) or in a tmux pane (split mode).
+4. When `code-reviewer` finds a `Promise.all` that should be sequential, it messages `security-auditor` to confirm the auth call isn't part of the race. `security-auditor` checks and replies — either confirming the race is the real issue or producing counter-evidence.
+5. `test-engineer` proposes a focused integration test for whichever theory is winning, which the team uses to verify before declaring consensus.
+6. The lead synthesizes the converged finding and presents it to you.
+
+You can interrupt at any teammate by cycling with `Shift+Down` and typing — useful for redirecting an investigator who's gone down a wrong path.
+
+### When to clean up
+
+When the investigation lands on a root cause, tell the lead:
+
+```
+Clean up the team
+```
+
+Always cleanup through the lead, not a teammate (per the docs: teammates lack full team context for cleanup).
+
+### Cost expectation
+
+Three Sonnet teammates running for ~10–15 minutes of investigation costs noticeably more than the same three personas spawned as subagents by `/ship`. The justification is *quality of conclusion* — for production debugging where the wrong fix is expensive, the extra tokens are a bargain. For a routine PR review, stick with `/ship`.
+
+### Anti-pattern in this scenario
+
+Do **not** rebuild this as a `/debug` slash command that fans out subagents. Subagents can't message each other — you'd lose the adversarial debate that makes the pattern work. If a workflow keeps coming up, document the trigger prompt above as a snippet rather than wrapping it in a slash command that misuses subagents.
+
+### When *not* to use Agent Teams
+
+- Production-bound verdict on a known diff → use `/ship` (subagents).
+- One specialist perspective on one artifact → direct persona invocation.
+- Sequential lifecycle (spec → plan → build) → user-driven slash commands (Pattern 4).
+- Read-heavy research with a small digest → built-in `Explore` subagent.
+
+Reach for Agent Teams only when teammates **need** to challenge each other to produce the right answer.
+
+---
+
+## Anti-patterns
+
+### A. Router persona ("meta-orchestrator")
+
+A persona whose job is to decide which other persona to call.
+
+```
+/work → router-persona → "this needs a review" → code-reviewer → router (paraphrases) → user
+```
+
+**Why it fails:**
+- Pure routing layer with no domain value
+- Adds two paraphrasing hops → information loss + roughly 2× token cost
+- The user already knew they wanted a review; they could have called `/review` directly
+- Replicates the work that slash commands and intent mapping in `AGENTS.md` already do
+
+**What to do instead:** add or refine slash commands. Document intent → command mapping in `AGENTS.md`.
+
+---
+
+### B. Persona that calls another persona
+
+A `code-reviewer` that internally invokes `security-auditor` when it sees auth code.
+
+**Why it fails:**
+- Personas were designed to produce a single perspective; chaining them defeats that
+- The summary the calling persona passes loses context the called persona needs
+- Failure modes multiply (which persona's output format wins? whose rules apply?)
+- Hides cost from the user
+
+**What to do instead:** have the calling persona *recommend* a follow-up audit in its report. The user or a slash command runs the second pass.
+
+---
+
+### C. Sequential orchestrator that paraphrases
+
+An agent that calls `/spec`, then `/plan`, then `/build`, etc. on the user's behalf.
+
+**Why it fails:**
+- Loses the human checkpoints that catch wrong-direction work
+- Each hand-off summarizes context — accumulated drift over a long pipeline
+- Doubles token cost: orchestrator turn + sub-agent turn for every step
+- Removes user agency at exactly the points where judgment matters most
+
+**What to do instead:** keep the user as the orchestrator. Document the recommended sequence in `README.md` and let users invoke it.
+
+---
+
+### D. Deep persona trees
+
+`/ship` calls a `pre-ship-coordinator` that calls a `quality-coordinator` that calls `code-reviewer`.
+
+**Why it fails:**
+- Each layer adds latency and tokens with no decision value
+- Debugging becomes a multi-level investigation
+- The leaf personas lose context to multiple summarization steps
+
+**What to do instead:** keep the orchestration depth at most 1 (slash command → personas). The merge happens in the main agent.
+
+---
+
+## Decision flow
+
+When considering a new orchestrated workflow, walk this flow:
+
+```
+Is the work one perspective on one artifact?
+├── Yes → Direct invocation. Stop.
+└── No  → Will the same composition repeat?
+         ├── No  → Direct invocation, ad hoc. Stop.
+         └── Yes → Are sub-tasks independent?
+                  ├── No  → Sequential slash commands run by user (Pattern 4).
+                  └── Yes → Parallel fan-out with merge (Pattern 3).
+                           Validate against the checklist above.
+                           If any check fails → fall back to single-persona command (Pattern 2).
+```
+
+---
+
+## When to add a new pattern to this catalog
+
+Add a new entry only after:
+
+1. You've used the pattern at least twice in real work
+2. You can name a concrete artifact in this repo that demonstrates it
+3. You can explain why an existing pattern wouldn't have worked
+4. You can describe its anti-pattern shadow (what people will mistakenly build instead)
+
+Premature catalog entries become aspirational documentation that no one follows.
--- a/.claude/skills/incremental-implementation/references/performance-checklist.md
+++ b/.claude/skills/incremental-implementation/references/performance-checklist.md
@@ -0,0 +1,153 @@
+# Performance Checklist
+
+Quick reference checklist for web application performance. Use alongside the `performance-optimization` skill.
+
+## Table of Contents
+
+- [Core Web Vitals Targets](#core-web-vitals-targets)
+- [TTFB Diagnosis](#ttfb-diagnosis)
+- [Frontend Checklist](#frontend-checklist)
+- [Backend Checklist](#backend-checklist)
+- [Measurement Commands](#measurement-commands)
+- [Common Anti-Patterns](#common-anti-patterns)
+
+## Core Web Vitals Targets
+
+| Metric | Good | Needs Work | Poor |
+|--------|------|------------|------|
+| LCP (Largest Contentful Paint) | ≤ 2.5s | ≤ 4.0s | > 4.0s |
+| INP (Interaction to Next Paint) | ≤ 200ms | ≤ 500ms | > 500ms |
+| CLS (Cumulative Layout Shift) | ≤ 0.1 | ≤ 0.25 | > 0.25 |
+
+## TTFB Diagnosis
+
+When TTFB is slow (> 800ms), check each component in DevTools Network waterfall:
+
+- [ ] **DNS resolution** slow → add `<link rel="dns-prefetch">` or `<link rel="preconnect">` for known origins
+- [ ] **TCP/TLS handshake** slow → enable HTTP/2, consider edge deployment, verify keep-alive
+- [ ] **Server processing** slow → profile backend, check slow queries, add caching
+
+## Frontend Checklist
+
+### Images
+- [ ] Images use modern formats (WebP, AVIF)
+- [ ] Images are responsively sized (`srcset` and `sizes`)
+- [ ] Images and `<source>` elements have explicit `width` and `height` (prevents CLS in art direction)
+- [ ] Below-the-fold images use `loading="lazy"` and `decoding="async"`
+- [ ] Hero/LCP images use `fetchpriority="high"` and no lazy loading
+
+### JavaScript
+- [ ] Bundle size under 200KB gzipped (initial load)
+- [ ] Code splitting with dynamic `import()` for routes and heavy features
+- [ ] Tree shaking enabled (verify dependency ships ESM and marks `sideEffects: false`)
+- [ ] No blocking JavaScript in `<head>` (use `defer` or `async`)
+- [ ] Heavy computation offloaded to Web Workers (if applicable)
+- [ ] `React.memo()` on expensive components that re-render with same props
+- [ ] `useMemo()` / `useCallback()` only where profiling shows benefit
+- [ ] Long tasks (> 50ms) broken up to keep the main thread available — main lever for INP
+- [ ] `yieldToMain` pattern used inside long-running loops so input events can run between chunks
+- [ ] Modern scheduling APIs used where available: `scheduler.yield()` (preferred), `scheduler.postTask()` with priorities, `isInputPending()` to yield only when needed
+- [ ] `requestIdleCallback` for deferrable, non-urgent work (analytics flush, prefetch, warmup)
+- [ ] Non-critical work deferred out of event handlers (e.g. analytics, logging) so the response to the interaction is not delayed
+- [ ] Third-party scripts loaded with `async` / `defer`, audited for size, and fronted by a facade when heavy (chat widgets, embeds)
+
+### CSS
+- [ ] Critical CSS inlined or preloaded
+- [ ] No render-blocking CSS for non-critical styles
+- [ ] No CSS-in-JS runtime cost in production (use extraction)
+
+### Fonts
+- [ ] Limited to 2–3 font families, 2–3 weights each (every additional weight is another request)
+- [ ] WOFF2 format only (smallest, universal support — skip WOFF/TTF/EOT)
+- [ ] Self-hosted when possible (third-party font CDNs add DNS + TCP + TLS round-trips)
+- [ ] LCP-critical fonts preloaded: `<link rel="preload" as="font" type="font/woff2" crossorigin>`
+- [ ] `font-display: swap` (or `optional` for non-critical) to avoid FOIT blocking render
+- [ ] Subsetted via `unicode-range` to ship only the glyphs each page needs
+- [ ] Variable fonts considered when multiple weights/styles are required (one file replaces many)
+- [ ] Fallback font metrics adjusted with `size-adjust`, `ascent-override`, `descent-override` to reduce CLS on font swap
+- [ ] System font stack considered before any custom font
+
+### Network
+- [ ] Static assets cached with long `max-age` + content hashing
+- [ ] API responses cached where appropriate (`Cache-Control`)
+- [ ] HTTP/2 or HTTP/3 enabled
+- [ ] Resources preconnected (`<link rel="preconnect">`) for known origins
+- [ ] `fetchpriority` used on critical non-image resources (e.g., key `<link rel="preload">`, above-the-fold `<script>`) — not only on `<img>`
+- [ ] No unnecessary redirects
+
+### Rendering
+- [ ] No layout thrashing (forced synchronous layouts)
+- [ ] Animations use `transform` and `opacity` (GPU-accelerated)
+- [ ] Long lists use virtualization (e.g., `react-window`)
+- [ ] No unnecessary full-page re-renders
+- [ ] Off-screen sections use `content-visibility: auto` with `contain-intrinsic-size` to skip layout/paint of non-visible areas
+- [ ] No `unload` event handlers and no `Cache-Control: no-store` on HTML responses — preserves back/forward cache (bfcache) eligibility
+
+## Backend Checklist
+
+### Database
+- [ ] No N+1 query patterns (use eager loading / joins)
+- [ ] Queries have appropriate indexes
+- [ ] List endpoints paginated (never `SELECT * FROM table`)
+- [ ] Connection pooling configured
+- [ ] Slow query logging enabled
+
+### API
+- [ ] Response times < 200ms (p95)
+- [ ] No synchronous heavy computation in request handlers
+- [ ] Bulk operations instead of loops of individual calls
+- [ ] Response compression (gzip/brotli)
+- [ ] Appropriate caching (in-memory, Redis, CDN)
+
+### Infrastructure
+- [ ] CDN for static assets
+- [ ] Server located close to users (or edge deployment)
+- [ ] Horizontal scaling configured (if needed)
+- [ ] Health check endpoint for load balancer
+
+## Measurement Commands
+
+### INP field data and DevTools workflow
+
+1. **Field data first** — check [CrUX Vis](https://developer.chrome.com/docs/crux/vis) or your RUM tool for real-user INP before optimising
+2. **Identify slow interactions** — open DevTools → Performance panel → record while interacting; look for long tasks triggered by clicks/keystrokes
+3. **Test on mid-range Android** — INP issues often only surface on slower hardware; use a real device or DevTools CPU throttling (4×–6× slowdown)
+
+```bash
+# Lighthouse CLI
+npx lighthouse https://localhost:3000 --output json --output-path ./report.json
+
+# Bundle analysis
+npx webpack-bundle-analyzer stats.json
+# or for Vite:
+npx vite-bundle-visualizer
+
+# Check bundle size
+npx bundlesize
+
+# Web Vitals in code
+import { onLCP, onINP, onCLS } from 'web-vitals';
+onLCP(console.log);
+onINP(console.log);
+onCLS(console.log);
+
+# INP with interaction-level detail (attribution build)
+import { onINP } from 'web-vitals/attribution';
+onINP(({ value, attribution }) => {
+  const { interactionTarget, inputDelay, processingDuration, presentationDelay } = attribution;
+  console.log({ value, interactionTarget, inputDelay, processingDuration, presentationDelay });
+});
+```
+
+## Common Anti-Patterns
+
+| Anti-Pattern | Impact | Fix |
+|---|---|---|
+| N+1 queries | Linear DB load growth | Use joins, includes, or batch loading |
+| Unbounded queries | Memory exhaustion, timeouts | Always paginate, add LIMIT |
+| Missing indexes | Slow reads as data grows | Add indexes for filtered/sorted columns |
+| Layout thrashing | Jank, dropped frames | Batch DOM reads, then batch writes |
+| Unoptimized images | Slow LCP, wasted bandwidth | Use WebP, responsive sizes, lazy load |
+| Large bundles | Slow Time to Interactive | Code split, tree shake, audit deps |
+| Blocking main thread | Poor INP, unresponsive UI | Chunk long tasks with `scheduler.yield()` / `yieldToMain`, offload to Web Workers |
+| Memory leaks | Growing memory, eventual crash | Clean up listeners, intervals, refs |
--- a/.claude/skills/incremental-implementation/references/security-checklist.md
+++ b/.claude/skills/incremental-implementation/references/security-checklist.md
@@ -0,0 +1,134 @@
+# Security Checklist
+
+Quick reference for web application security. Use alongside the `security-and-hardening` skill.
+
+## Table of Contents
+
+- [Pre-Commit Checks](#pre-commit-checks)
+- [Authentication](#authentication)
+- [Authorization](#authorization)
+- [Input Validation](#input-validation)
+- [Security Headers](#security-headers)
+- [CORS Configuration](#cors-configuration)
+- [Data Protection](#data-protection)
+- [Dependency Security](#dependency-security)
+- [Error Handling](#error-handling)
+- [OWASP Top 10 Quick Reference](#owasp-top-10-quick-reference)
+
+## Pre-Commit Checks
+
+- [ ] No secrets in code (`git diff --cached | grep -i "password\|secret\|api_key\|token"`)
+- [ ] `.gitignore` covers: `.env`, `.env.local`, `*.pem`, `*.key`
+- [ ] `.env.example` uses placeholder values (not real secrets)
+
+## Authentication
+
+- [ ] Passwords hashed with bcrypt (≥12 rounds), scrypt, or argon2
+- [ ] Session cookies: `httpOnly`, `secure`, `sameSite: 'lax'`
+- [ ] Session expiration configured (reasonable max-age)
+- [ ] Rate limiting on login endpoint (≤10 attempts per 15 minutes)
+- [ ] Password reset tokens: time-limited (≤1 hour), single-use
+- [ ] Account lockout after repeated failures (optional, with notification)
+- [ ] MFA supported for sensitive operations (optional but recommended)
+
+## Authorization
+
+- [ ] Every protected endpoint checks authentication
+- [ ] Every resource access checks ownership/role (prevents IDOR)
+- [ ] Admin endpoints require admin role verification
+- [ ] API keys scoped to minimum necessary permissions
+- [ ] JWT tokens validated (signature, expiration, issuer)
+
+## Input Validation
+
+- [ ] All user input validated at system boundaries (API routes, form handlers)
+- [ ] Validation uses allowlists (not denylists)
+- [ ] String lengths constrained (min/max)
+- [ ] Numeric ranges validated
+- [ ] Email, URL, and date formats validated with proper libraries
+- [ ] File uploads: type restricted, size limited, content verified
+- [ ] SQL queries parameterized (no string concatenation)
+- [ ] HTML output encoded (use framework auto-escaping)
+- [ ] URLs validated before redirect (prevent open redirect)
+
+## Security Headers
+
+```
+Content-Security-Policy: default-src 'self'; script-src 'self'
+Strict-Transport-Security: max-age=31536000; includeSubDomains
+X-Content-Type-Options: nosniff
+X-Frame-Options: DENY
+X-XSS-Protection: 0  (disabled, rely on CSP)
+Referrer-Policy: strict-origin-when-cross-origin
+Permissions-Policy: camera=(), microphone=(), geolocation=()
+```
+
+## CORS Configuration
+
+```typescript
+// Restrictive (recommended)
+cors({
+  origin: ['https://yourdomain.com', 'https://app.yourdomain.com'],
+  credentials: true,
+  methods: ['GET', 'POST', 'PUT', 'PATCH', 'DELETE'],
+  allowedHeaders: ['Content-Type', 'Authorization'],
+})
+
+// NEVER use in production:
+cors({ origin: '*' })  // Allows any origin
+```
+
+## Data Protection
+
+- [ ] Sensitive fields excluded from API responses (`passwordHash`, `resetToken`, etc.)
+- [ ] Sensitive data not logged (passwords, tokens, full CC numbers)
+- [ ] PII encrypted at rest (if required by regulation)
+- [ ] HTTPS for all external communication
+- [ ] Database backups encrypted
+
+## Dependency Security
+
+```bash
+# Audit dependencies
+npm audit
+
+# Fix automatically where possible
+npm audit fix
+
+# Check for critical vulnerabilities
+npm audit --audit-level=critical
+
+# Keep dependencies updated
+npx npm-check-updates
+```
+
+## Error Handling
+
+```typescript
+// Production: generic error, no internals
+res.status(500).json({
+  error: { code: 'INTERNAL_ERROR', message: 'Something went wrong' }
+});
+
+// NEVER in production:
+res.status(500).json({
+  error: err.message,
+  stack: err.stack,         // Exposes internals
+  query: err.sql,           // Exposes database details
+});
+```
+
+## OWASP Top 10 Quick Reference
+
+| # | Vulnerability | Prevention |
+|---|---|---|
+| 1 | Broken Access Control | Auth checks on every endpoint, ownership verification |
+| 2 | Cryptographic Failures | HTTPS, strong hashing, no secrets in code |
+| 3 | Injection | Parameterized queries, input validation |
+| 4 | Insecure Design | Threat modeling, spec-driven development |
+| 5 | Security Misconfiguration | Security headers, minimal permissions, audit deps |
+| 6 | Vulnerable Components | `npm audit`, keep deps updated, minimal deps |
+| 7 | Auth Failures | Strong passwords, rate limiting, session management |
+| 8 | Data Integrity Failures | Verify updates/dependencies, signed artifacts |
+| 9 | Logging Failures | Log security events, don't log secrets |
+| 10 | SSRF | Validate/allowlist URLs, restrict outbound requests |
--- a/.claude/skills/incremental-implementation/references/testing-patterns.md
+++ b/.claude/skills/incremental-implementation/references/testing-patterns.md
@@ -0,0 +1,236 @@
+# Testing Patterns Reference
+
+Quick reference for common testing patterns across the stack. Use alongside the `test-driven-development` skill.
+
+## Table of Contents
+
+- [Test Structure (Arrange-Act-Assert)](#test-structure-arrange-act-assert)
+- [Test Naming Conventions](#test-naming-conventions)
+- [Common Assertions](#common-assertions)
+- [Mocking Patterns](#mocking-patterns)
+- [React/Component Testing](#reactcomponent-testing)
+- [API / Integration Testing](#api--integration-testing)
+- [E2E Testing (Playwright)](#e2e-testing-playwright)
+- [Test Anti-Patterns](#test-anti-patterns)
+
+## Test Structure (Arrange-Act-Assert)
+
+```typescript
+it('describes expected behavior', () => {
+  // Arrange: Set up test data and preconditions
+  const input = { title: 'Test Task', priority: 'high' };
+
+  // Act: Perform the action being tested
+  const result = createTask(input);
+
+  // Assert: Verify the outcome
+  expect(result.title).toBe('Test Task');
+  expect(result.priority).toBe('high');
+  expect(result.status).toBe('pending');
+});
+```
+
+## Test Naming Conventions
+
+```typescript
+// Pattern: [unit] [expected behavior] [condition]
+describe('TaskService.createTask', () => {
+  it('creates a task with default pending status', () => {});
+  it('throws ValidationError when title is empty', () => {});
+  it('trims whitespace from title', () => {});
+  it('generates a unique ID for each task', () => {});
+});
+```
+
+## Common Assertions
+
+```typescript
+// Equality
+expect(result).toBe(expected);           // Strict equality (===)
+expect(result).toEqual(expected);        // Deep equality (objects/arrays)
+expect(result).toStrictEqual(expected);  // Deep equality + type matching
+
+// Truthiness
+expect(result).toBeTruthy();
+expect(result).toBeFalsy();
+expect(result).toBeNull();
+expect(result).toBeDefined();
+expect(result).toBeUndefined();
+
+// Numbers
+expect(result).toBeGreaterThan(5);
+expect(result).toBeLessThanOrEqual(10);
+expect(result).toBeCloseTo(0.3, 5);      // Floating point
+
+// Strings
+expect(result).toMatch(/pattern/);
+expect(result).toContain('substring');
+
+// Arrays / Objects
+expect(array).toContain(item);
+expect(array).toHaveLength(3);
+expect(object).toHaveProperty('key', 'value');
+
+// Errors
+expect(() => fn()).toThrow();
+expect(() => fn()).toThrow(ValidationError);
+expect(() => fn()).toThrow('specific message');
+
+// Async
+await expect(asyncFn()).resolves.toBe(value);
+await expect(asyncFn()).rejects.toThrow(Error);
+```
+
+## Mocking Patterns
+
+### Mock Functions
+
+```typescript
+const mockFn = jest.fn();
+mockFn.mockReturnValue(42);
+mockFn.mockResolvedValue({ data: 'test' });
+mockFn.mockImplementation((x) => x * 2);
+
+expect(mockFn).toHaveBeenCalled();
+expect(mockFn).toHaveBeenCalledWith('arg1', 'arg2');
+expect(mockFn).toHaveBeenCalledTimes(3);
+```
+
+### Mock Modules
+
+```typescript
+// Mock an entire module
+jest.mock('./database', () => ({
+  query: jest.fn().mockResolvedValue([{ id: 1, title: 'Test' }]),
+}));
+
+// Mock specific exports
+jest.mock('./utils', () => ({
+  ...jest.requireActual('./utils'),
+  generateId: jest.fn().mockReturnValue('test-id'),
+}));
+```
+
+### Mock at Boundaries Only
+
+```
+Mock these:                    Don't mock these:
+├── Database calls             ├── Internal utility functions
+├── HTTP requests              ├── Business logic
+├── File system operations     ├── Data transformations
+├── External API calls         ├── Validation functions
+└── Time/Date (when needed)    └── Pure functions
+```
+
+## React/Component Testing
+
+```tsx
+import { render, screen, fireEvent, waitFor } from '@testing-library/react';
+
+describe('TaskForm', () => {
+  it('submits the form with entered data', async () => {
+    const onSubmit = jest.fn();
+    render(<TaskForm onSubmit={onSubmit} />);
+
+    // Find elements by accessible role/label (not test IDs)
+    await screen.findByRole('textbox', { name: /title/i });
+    fireEvent.change(screen.getByRole('textbox', { name: /title/i }), {
+      target: { value: 'New Task' },
+    });
+    fireEvent.click(screen.getByRole('button', { name: /create/i }));
+
+    await waitFor(() => {
+      expect(onSubmit).toHaveBeenCalledWith({ title: 'New Task' });
+    });
+  });
+
+  it('shows validation error for empty title', async () => {
+    render(<TaskForm onSubmit={jest.fn()} />);
+
+    fireEvent.click(screen.getByRole('button', { name: /create/i }));
+
+    expect(await screen.findByText(/title is required/i)).toBeInTheDocument();
+  });
+});
+```
+
+## API / Integration Testing
+
+```typescript
+import request from 'supertest';
+import { app } from '../src/app';
+
+describe('POST /api/tasks', () => {
+  it('creates a task and returns 201', async () => {
+    const response = await request(app)
+      .post('/api/tasks')
+      .send({ title: 'Test Task' })
+      .set('Authorization', `Bearer ${testToken}`)
+      .expect(201);
+
+    expect(response.body).toMatchObject({
+      id: expect.any(String),
+      title: 'Test Task',
+      status: 'pending',
+    });
+  });
+
+  it('returns 422 for invalid input', async () => {
+    const response = await request(app)
+      .post('/api/tasks')
+      .send({ title: '' })
+      .set('Authorization', `Bearer ${testToken}`)
+      .expect(422);
+
+    expect(response.body.error.code).toBe('VALIDATION_ERROR');
+  });
+
+  it('returns 401 without authentication', async () => {
+    await request(app)
+      .post('/api/tasks')
+      .send({ title: 'Test' })
+      .expect(401);
+  });
+});
+```
+
+## E2E Testing (Playwright)
+
+```typescript
+import { test, expect } from '@playwright/test';
+
+test('user can create and complete a task', async ({ page }) => {
+  // Navigate and authenticate
+  await page.goto('/');
+  await page.fill('[name="email"]', 'test@example.com');
+  await page.fill('[name="password"]', 'testpass123');
+  await page.click('button:has-text("Log in")');
+
+  // Create a task
+  await page.click('button:has-text("New Task")');
+  await page.fill('[name="title"]', 'Buy groceries');
+  await page.click('button:has-text("Create")');
+
+  // Verify task appears
+  await expect(page.locator('text=Buy groceries')).toBeVisible();
+
+  // Complete the task
+  await page.click('[aria-label="Complete Buy groceries"]');
+  await expect(page.locator('text=Buy groceries')).toHaveCSS(
+    'text-decoration-line', 'line-through'
+  );
+});
+```
+
+## Test Anti-Patterns
+
+| Anti-Pattern | Problem | Better Approach |
+|---|---|---|
+| Testing implementation details | Breaks on refactor | Test inputs/outputs |
+| Snapshot everything | No one reviews snapshot diffs | Assert specific values |
+| Shared mutable state | Tests pollute each other | Setup/teardown per test |
+| Testing third-party code | Wastes time, not your bug | Mock the boundary |
+| Skipping tests to pass CI | Hides real bugs | Fix or delete the test |
+| Using `test.skip` permanently | Dead code | Remove or fix it |
+| Overly broad assertions | Doesn't catch regressions | Be specific |
+| No async error handling | Swallowed errors, false passes | Always `await` async tests |