Initial commit

2026-05-16 13:43:29 +07:00
commit 688a86b611
208 changed files with 44350 additions and 0 deletions
--- a/.claude/skills/shipping-and-launch/SKILL.md
+++ b/.claude/skills/shipping-and-launch/SKILL.md
@@ -0,0 +1,309 @@
+---
+name: shipping-and-launch
+description: Prepares production launches. Use when preparing to deploy to production. Use when you need a pre-launch checklist, when setting up monitoring, when planning a staged rollout, or when you need a rollback strategy.
+---
+
+# Shipping and Launch
+
+## Overview
+
+Ship with confidence. The goal is not just to deploy — it's to deploy safely, with monitoring in place, a rollback plan ready, and a clear understanding of what success looks like. Every launch should be reversible, observable, and incremental.
+
+## When to Use
+
+- Deploying a feature to production for the first time
+- Releasing a significant change to users
+- Migrating data or infrastructure
+- Opening a beta or early access program
+- Any deployment that carries risk (all of them)
+
+## The Pre-Launch Checklist
+
+### Code Quality
+
+- [ ] All tests pass (unit, integration, e2e)
+- [ ] Build succeeds with no warnings
+- [ ] Lint and type checking pass
+- [ ] Code reviewed and approved
+- [ ] No TODO comments that should be resolved before launch
+- [ ] No `console.log` debugging statements in production code
+- [ ] Error handling covers expected failure modes
+
+### Security
+
+- [ ] No secrets in code or version control
+- [ ] `npm audit` shows no critical or high vulnerabilities
+- [ ] Input validation on all user-facing endpoints
+- [ ] Authentication and authorization checks in place
+- [ ] Security headers configured (CSP, HSTS, etc.)
+- [ ] Rate limiting on authentication endpoints
+- [ ] CORS configured to specific origins (not wildcard)
+
+### Performance
+
+- [ ] Core Web Vitals within "Good" thresholds
+- [ ] No N+1 queries in critical paths
+- [ ] Images optimized (compression, responsive sizes, lazy loading)
+- [ ] Bundle size within budget
+- [ ] Database queries have appropriate indexes
+- [ ] Caching configured for static assets and repeated queries
+
+### Accessibility
+
+- [ ] Keyboard navigation works for all interactive elements
+- [ ] Screen reader can convey page content and structure
+- [ ] Color contrast meets WCAG 2.1 AA (4.5:1 for text)
+- [ ] Focus management correct for modals and dynamic content
+- [ ] Error messages are descriptive and associated with form fields
+- [ ] No accessibility warnings in axe-core or Lighthouse
+
+### Infrastructure
+
+- [ ] Environment variables set in production
+- [ ] Database migrations applied (or ready to apply)
+- [ ] DNS and SSL configured
+- [ ] CDN configured for static assets
+- [ ] Logging and error reporting configured
+- [ ] Health check endpoint exists and responds
+
+### Documentation
+
+- [ ] README updated with any new setup requirements
+- [ ] API documentation current
+- [ ] ADRs written for any architectural decisions
+- [ ] Changelog updated
+- [ ] User-facing documentation updated (if applicable)
+
+## Feature Flag Strategy
+
+Ship behind feature flags to decouple deployment from release:
+
+```typescript
+// Feature flag check
+const flags = await getFeatureFlags(userId);
+
+if (flags.taskSharing) {
+  // New feature: task sharing
+  return <TaskSharingPanel task={task} />;
+}
+
+// Default: existing behavior
+return null;
+```
+
+**Feature flag lifecycle:**
+
+```
+1. DEPLOY with flag OFF     → Code is in production but inactive
+2. ENABLE for team/beta     → Internal testing in production environment
+3. GRADUAL ROLLOUT          → 5% → 25% → 50% → 100% of users
+4. MONITOR at each stage    → Watch error rates, performance, user feedback
+5. CLEAN UP                 → Remove flag and dead code path after full rollout
+```
+
+**Rules:**
+- Every feature flag has an owner and an expiration date
+- Clean up flags within 2 weeks of full rollout
+- Don't nest feature flags (creates exponential combinations)
+- Test both flag states (on and off) in CI
+
+## Staged Rollout
+
+### The Rollout Sequence
+
+```
+1. DEPLOY to staging
+   └── Full test suite in staging environment
+   └── Manual smoke test of critical flows
+
+2. DEPLOY to production (feature flag OFF)
+   └── Verify deployment succeeded (health check)
+   └── Check error monitoring (no new errors)
+
+3. ENABLE for team (flag ON for internal users)
+   └── Team uses the feature in production
+   └── 24-hour monitoring window
+
+4. CANARY rollout (flag ON for 5% of users)
+   └── Monitor error rates, latency, user behavior
+   └── Compare metrics: canary vs. baseline
+   └── 24-48 hour monitoring window
+   └── Advance only if all thresholds pass (see table below)
+
+5. GRADUAL increase (25% -> 50% -> 100%)
+   └── Same monitoring at each step
+   └── Ability to roll back to previous percentage at any point
+
+6. FULL rollout (flag ON for all users)
+   └── Monitor for 1 week
+   └── Clean up feature flag
+```
+
+### Rollout Decision Thresholds
+
+Use these thresholds to decide whether to advance, hold, or roll back at each stage:
+
+| Metric | Advance (green) | Hold and investigate (yellow) | Roll back (red) |
+|--------|-----------------|-------------------------------|-----------------|
+| Error rate | Within 10% of baseline | 10-100% above baseline | >2x baseline |
+| P95 latency | Within 20% of baseline | 20-50% above baseline | >50% above baseline |
+| Client JS errors | No new error types | New errors at <0.1% of sessions | New errors at >0.1% of sessions |
+| Business metrics | Neutral or positive | Decline <5% (may be noise) | Decline >5% |
+
+### When to Roll Back
+
+Roll back immediately if:
+- Error rate increases by more than 2x baseline
+- P95 latency increases by more than 50%
+- User-reported issues spike
+- Data integrity issues detected
+- Security vulnerability discovered
+
+## Monitoring and Observability
+
+### What to Monitor
+
+```
+Application metrics:
+├── Error rate (total and by endpoint)
+├── Response time (p50, p95, p99)
+├── Request volume
+├── Active users
+└── Key business metrics (conversion, engagement)
+
+Infrastructure metrics:
+├── CPU and memory utilization
+├── Database connection pool usage
+├── Disk space
+├── Network latency
+└── Queue depth (if applicable)
+
+Client metrics:
+├── Core Web Vitals (LCP, INP, CLS)
+├── JavaScript errors
+├── API error rates from client perspective
+└── Page load time
+```
+
+### Error Reporting
+
+```typescript
+// Set up error boundary with reporting
+class ErrorBoundary extends React.Component {
+  componentDidCatch(error: Error, info: React.ErrorInfo) {
+    // Report to error tracking service
+    reportError(error, {
+      componentStack: info.componentStack,
+      userId: getCurrentUser()?.id,
+      page: window.location.pathname,
+    });
+  }
+
+  render() {
+    if (this.state.hasError) {
+      return <ErrorFallback onRetry={() => this.setState({ hasError: false })} />;
+    }
+    return this.props.children;
+  }
+}
+
+// Server-side error reporting
+app.use((err: Error, req: Request, res: Response, next: NextFunction) => {
+  reportError(err, {
+    method: req.method,
+    url: req.url,
+    userId: req.user?.id,
+  });
+
+  // Don't expose internals to users
+  res.status(500).json({
+    error: { code: 'INTERNAL_ERROR', message: 'Something went wrong' },
+  });
+});
+```
+
+### Post-Launch Verification
+
+In the first hour after launch:
+
+```
+1. Check health endpoint returns 200
+2. Check error monitoring dashboard (no new error types)
+3. Check latency dashboard (no regression)
+4. Test the critical user flow manually
+5. Verify logs are flowing and readable
+6. Confirm rollback mechanism works (dry run if possible)
+```
+
+## Rollback Strategy
+
+Every deployment needs a rollback plan before it happens:
+
+```markdown
+## Rollback Plan for [Feature/Release]
+
+### Trigger Conditions
+- Error rate > 2x baseline
+- P95 latency > [X]ms
+- User reports of [specific issue]
+
+### Rollback Steps
+1. Disable feature flag (if applicable)
+   OR
+1. Deploy previous version: `git revert <commit> && git push`
+2. Verify rollback: health check, error monitoring
+3. Communicate: notify team of rollback
+
+### Database Considerations
+- Migration [X] has a rollback: `npx prisma migrate rollback`
+- Data inserted by new feature: [preserved / cleaned up]
+
+### Time to Rollback
+- Feature flag: < 1 minute
+- Redeploy previous version: < 5 minutes
+- Database rollback: < 15 minutes
+```
+## See Also
+
+- For security pre-launch checks, see `references/security-checklist.md`
+- For performance pre-launch checklist, see `references/performance-checklist.md`
+- For accessibility verification before launch, see `references/accessibility-checklist.md`
+
+## Common Rationalizations
+
+| Rationalization | Reality |
+|---|---|
+| "It works in staging, it'll work in production" | Production has different data, traffic patterns, and edge cases. Monitor after deploy. |
+| "We don't need feature flags for this" | Every feature benefits from a kill switch. Even "simple" changes can break things. |
+| "Monitoring is overhead" | Not having monitoring means you discover problems from user complaints instead of dashboards. |
+| "We'll add monitoring later" | Add it before launch. You can't debug what you can't see. |
+| "Rolling back is admitting failure" | Rolling back is responsible engineering. Shipping a broken feature is the failure. |
+
+## Red Flags
+
+- Deploying without a rollback plan
+- No monitoring or error reporting in production
+- Big-bang releases (everything at once, no staging)
+- Feature flags with no expiration or owner
+- No one monitoring the deploy for the first hour
+- Production environment configuration done by memory, not code
+- "It's Friday afternoon, let's ship it"
+
+## Verification
+
+Before deploying:
+
+- [ ] Pre-launch checklist completed (all sections green)
+- [ ] Feature flag configured (if applicable)
+- [ ] Rollback plan documented
+- [ ] Monitoring dashboards set up
+- [ ] Team notified of deployment
+
+After deploying:
+
+- [ ] Health check returns 200
+- [ ] Error rate is normal
+- [ ] Latency is normal
+- [ ] Critical user flow works
+- [ ] Logs are flowing
+- [ ] Rollback tested or verified ready
--- a/.claude/skills/shipping-and-launch/references/accessibility-checklist.md
+++ b/.claude/skills/shipping-and-launch/references/accessibility-checklist.md
@@ -0,0 +1,160 @@
+# Accessibility Checklist
+
+Quick reference for WCAG 2.1 AA compliance. Use alongside the `frontend-ui-engineering` skill.
+
+## Table of Contents
+
+- [Essential Checks](#essential-checks)
+- [Common HTML Patterns](#common-html-patterns)
+- [Testing Tools](#testing-tools)
+- [Quick Reference: ARIA Live Regions](#quick-reference-aria-live-regions)
+- [Common Anti-Patterns](#common-anti-patterns)
+
+## Essential Checks
+
+### Keyboard Navigation
+- [ ] All interactive elements focusable via Tab key
+- [ ] Focus order follows visual/logical order
+- [ ] Focus is visible (outline/ring on focused elements)
+- [ ] Custom widgets have keyboard support (Enter to activate, Escape to close)
+- [ ] No keyboard traps (user can always Tab away from a component)
+- [ ] Skip-to-content link at top of page - visible (at least) on keyboard focus
+- [ ] Modals trap focus while open, return focus on close
+
+### Screen Readers
+- [ ] All images have `alt` text (or `alt=""` for decorative images)
+- [ ] All form inputs have associated labels (`<label>` or `aria-label`)
+- [ ] Buttons and links have descriptive text (not "Click here")
+- [ ] Icon-only buttons have `aria-label`
+- [ ] Page has one `<h1>` and headings don't skip levels
+- [ ] Dynamic content changes announced (`aria-live` regions)
+- [ ] Tables have `<th>` headers with scope
+
+### Visual
+- [ ] Text contrast ≥ 4.5:1 (normal text) or ≥ 3:1 (large text, 18px+)
+- [ ] UI components contrast ≥ 3:1 against background
+- [ ] Color is not the only way to convey information
+- [ ] Text resizable to 200% without breaking layout
+- [ ] No content that flashes more than 3 times per second
+
+### Forms
+- [ ] Every input has a visible label
+- [ ] Required fields indicated (not by color alone)
+- [ ] Error messages specific and associated with the field
+- [ ] Error state visible by more than color (icon, text, border)
+- [ ] Form submission errors summarized and focusable
+- [ ] Known fields use autocomplete (for example `type="email" autocomplete="email"`)
+
+### Content
+- [ ] Language declared (`<html lang="en">`)
+- [ ] Page has a descriptive `<title>`
+- [ ] Links distinguish from surrounding text (not by color alone)
+- [ ] Touch targets ≥ 44x44px on mobile
+- [ ] Meaningful empty states (not blank screens)
+
+## Common HTML Patterns
+
+### Buttons vs. Links
+
+```html
+<!-- Use <button> for actions -->
+<button onClick={handleDelete}>Delete Task</button>
+
+<!-- Use <a> for navigation -->
+<a href="/tasks/123">View Task</a>
+
+<!-- NEVER use div/span as buttons -->
+<div onClick={handleDelete}>Delete</div>  <!-- BAD -->
+```
+
+### Form Labels
+
+```html
+<!-- Explicit label association -->
+<label htmlFor="email">Email address</label>
+<input id="email" type="email" required />
+
+<!-- Implicit wrapping -->
+<label>
+  Email address
+  <input type="email" required />
+</label>
+
+<!-- Hidden label (visible label preferred) -->
+<input type="search" aria-label="Search tasks" />
+```
+
+### ARIA Roles
+
+```html
+<!-- Navigation -->
+<nav aria-label="Main navigation">...</nav>
+<nav aria-label="Footer links">...</nav>
+
+<!-- Status messages -->
+<div role="status" aria-live="polite">Task saved</div>
+
+<!-- Alert messages -->
+<div role="alert">Error: Title is required</div>
+
+<!-- Modal dialogs -->
+<dialog aria-modal="true" aria-labelledby="dialog-title">
+  <h2 id="dialog-title">Confirm Delete</h2>
+  ...
+</dialog>
+
+<!-- Loading states -->
+<div aria-busy="true" aria-label="Loading tasks">
+  <Spinner />
+</div>
+```
+
+### Accessible Lists
+
+```html
+<ul role="list" aria-label="Tasks">
+  <li>
+    <input type="checkbox" id="task-1" aria-label="Complete: Buy groceries" />
+    <label htmlFor="task-1">Buy groceries</label>
+  </li>
+</ul>
+```
+
+## Testing Tools
+
+```bash
+# Automated audit
+npx axe-core          # Programmatic accessibility testing
+npx pa11y             # CLI accessibility checker
+
+# In browser
+# Chrome DevTools → Lighthouse → Accessibility
+# Chrome DevTools → Elements → Accessibility tree
+
+# Screen reader testing
+# macOS: VoiceOver (Cmd + F5)
+# Windows: NVDA (free) or JAWS
+# Linux: Orca
+```
+
+## Quick Reference: ARIA Live Regions
+
+| Value | Behavior | Use For |
+|-------|----------|---------|
+| `aria-live="polite"` | Announced at next pause | Status updates, saved confirmations |
+| `aria-live="assertive"` | Announced immediately | Errors, time-sensitive alerts |
+| `role="status"` | Same as `polite` | Status messages |
+| `role="alert"` | Same as `assertive` | Error messages |
+
+## Common Anti-Patterns
+
+| Anti-Pattern | Problem | Fix |
+|---|---|---|
+| `div` as button | Not focusable, no keyboard support | Use `<button>` |
+| Missing `alt` text | Images invisible to screen readers | Add descriptive `alt` |
+| Color-only states | Invisible to color-blind users | Add icons, text, or patterns |
+| Autoplaying media | Disorienting, can't be stopped | Add controls, don't autoplay |
+| Custom dropdown with no ARIA | Unusable by keyboard/screen reader | Use native `<select>` or proper ARIA listbox |
+| Removing focus outlines | Users can't see where they are | Style outlines, don't remove them |
+| Empty links/buttons | "Link" announced with no description | Add text or `aria-label` |
+| `tabindex > 0` | Breaks natural tab order | Use `tabindex="0"` or `-1` only |
--- a/.claude/skills/shipping-and-launch/references/orchestration-patterns.md
+++ b/.claude/skills/shipping-and-launch/references/orchestration-patterns.md
@@ -0,0 +1,370 @@
+# Orchestration Patterns
+
+Reference catalog of agent orchestration patterns this repo endorses, plus anti-patterns to avoid. Read this before adding a new slash command that coordinates multiple personas, or before introducing a new persona that "wraps" existing ones.
+
+The governing rule: **the user (or a slash command) is the orchestrator. Personas do not invoke other personas.** Skills are mandatory hops inside a persona's workflow.
+
+---
+
+## Endorsed patterns
+
+### 1. Direct invocation (no orchestration)
+
+Single persona, single perspective, single artifact. The default and the cheapest option.
+
+```
+user → code-reviewer → report → user
+```
+
+**Use when:** the work is one perspective on one artifact and you can describe it in one sentence.
+
+**Examples:**
+- "Review this PR" → `code-reviewer`
+- "Find security issues in `auth.ts`" → `security-auditor`
+- "What tests are missing for the checkout flow?" → `test-engineer`
+
+**Cost:** one round trip. The baseline you should always compare orchestrated patterns against.
+
+---
+
+### 2. Single-persona slash command
+
+A slash command that wraps one persona with the project's skills. Saves the user from re-explaining the workflow every time.
+
+```
+/review → code-reviewer (with code-review-and-quality skill) → report
+```
+
+**Use when:** the same single-persona invocation happens repeatedly with the same setup.
+
+**Examples in this repo:** `/review`, `/test`, `/code-simplify`.
+
+**Cost:** same as direct invocation. The slash command is just a saved prompt.
+
+**Anti-signal:** if the slash command's body is mostly "decide which persona to call," delete it and let the user call the persona directly.
+
+---
+
+### 3. Parallel fan-out with merge
+
+Multiple personas operate on the same input concurrently, each producing an independent report. A merge step (in the main agent's context) synthesizes them into a single decision.
+
+```
+                    ┌─→ code-reviewer    ─┐
+/ship → fan out  ───┼─→ security-auditor ─┤→ merge → go/no-go + rollback
+                    └─→ test-engineer    ─┘
+```
+
+**Use when:**
+- The sub-tasks are genuinely independent (no shared mutable state, no ordering dependency)
+- Each sub-agent benefits from its own context window
+- The merge step is small enough to stay in the main context
+- Wall-clock latency matters
+
+**Examples in this repo:** `/ship`.
+
+**Cost:** N parallel sub-agent contexts + one merge turn. Higher than direct invocation, but faster wall-clock and produces better reports because each sub-agent stays focused on its single perspective.
+
+**Validation checklist before adopting this pattern:**
+- [ ] Can I run all sub-agents at the same time without ordering issues?
+- [ ] Does each persona produce a different *kind* of finding, not just the same finding from a different angle?
+- [ ] Will the merge step fit in the main agent's remaining context?
+- [ ] Is the user's wait time long enough that parallelism is actually noticeable?
+
+If any answer is "no," fall back to direct invocation or a single-persona command.
+
+---
+
+### 4. Sequential pipeline as user-driven slash commands
+
+The user runs slash commands in a defined order, carrying context (or commit history) between them. There is no orchestrator agent — the user IS the orchestrator.
+
+```
+user runs:  /spec  →  /plan  →  /build  →  /test  →  /review  →  /ship
+```
+
+**Use when:** the workflow has dependencies (each step needs the previous step's output) and human judgment between steps adds value.
+
+**Examples in this repo:** the entire DEFINE → PLAN → BUILD → VERIFY → REVIEW → SHIP lifecycle.
+
+**Cost:** one sub-agent context per step. Free for the orchestration layer because there is no orchestrator agent.
+
+**Why not automate it:** an LLM "lifecycle orchestrator" would (a) lose nuance between steps because it has to summarize for hand-off, (b) skip the human checkpoints that catch wrong-direction work early, and (c) double the token cost via paraphrasing turns.
+
+---
+
+### 5. Research isolation (context preservation)
+
+When a task requires reading large amounts of material that shouldn't pollute the main context, spawn a research sub-agent that returns only a digest.
+
+```
+main agent → research sub-agent (reads 50 files) → digest → main agent continues
+```
+
+**Use when:**
+- The main session needs to stay focused on a downstream task
+- The investigation result is much smaller than the input it consumes
+- The decision quality benefits from the main agent having room to think after
+
+**Examples:** "Find every call site of this deprecated API across the monorepo," "Summarize what these 30 ADRs say about caching."
+
+**Cost:** one isolated sub-agent context. Worth it any time the alternative is loading hundreds of files into the main context.
+
+**On Claude Code, use the built-in `Explore` subagent** rather than defining a custom research persona. `Explore` runs on Haiku, is denied write/edit tools, and is purpose-built for this pattern. Define a custom research subagent only when `Explore` doesn't fit (e.g. you need a domain-specific system prompt the model wouldn't infer).
+
+---
+
+## Claude Code compatibility
+
+This catalog is harness-agnostic, but most readers will run it on Claude Code. Here's how each pattern maps onto Claude Code's primitives — and where the platform enforces our rules for us.
+
+### Where personas live
+
+Plugin subagents go in `agents/` at the plugin root. This repo is a plugin (`.claude-plugin/plugin.json`), so `agents/code-reviewer.md`, `agents/security-auditor.md`, and `agents/test-engineer.md` are auto-discovered when the plugin is enabled. No path configuration needed.
+
+### Subagents vs. Agent Teams
+
+Claude Code has two parallelism primitives. Pattern 3 (parallel fan-out with merge) maps to **subagents**. If you need teammates that talk to each other, use **Agent Teams** instead.
+
+| | Subagents | Agent Teams |
+|--|-----------|-------------|
+| Coordination | Main agent fans out, sub-agents only report back | Teammates message each other, share a task list |
+| Context | Own context window per subagent | Own context window per teammate |
+| When to use | Independent tasks producing reports | Collaborative work needing discussion |
+| Status | Stable | Experimental — requires `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` |
+| Cost | Lower | Higher — each teammate is a separate Claude instance |
+
+**The personas in this repo work in both modes.** When spawned as subagents (e.g. by `/ship`), they report findings to the main session. When spawned as teammates (`Spawn a teammate using the security-auditor agent type…`), they can challenge each other's findings directly. The persona definition is the same; only the spawning context changes.
+
+One subtlety: the `skills` and `mcpServers` frontmatter fields in a persona are honored when it runs as a subagent but **ignored when it runs as a teammate** — teammates load skills and MCP servers from your project and user settings, the same as a regular session. If a persona depends on a specific skill or MCP server being loaded, configure it at the session level so it's available in both modes.
+
+### Platform-enforced rules
+
+Two rules in this catalog aren't just convention — Claude Code enforces them:
+
+- **"Subagents cannot spawn other subagents"** (verbatim from the docs). Anti-pattern B (persona-calls-persona) and Anti-pattern D (deep persona trees) cannot exist on Claude Code by construction.
+- **"No nested teams"** — teammates cannot spawn their own teams. Same anti-patterns blocked at the team level.
+
+This means you can adopt the patterns in this catalog without worrying about contributors accidentally building the anti-patterns. They'll just fail to load.
+
+### Built-in subagents to know about
+
+Before defining a custom subagent, check whether one of these covers the role:
+
+| Built-in | Purpose |
+|----------|---------|
+| `Explore` | Read-only codebase search and analysis. Use this for Pattern 5 (research isolation). |
+| `Plan` | Read-only research during plan mode. |
+| `general-purpose` | Multi-step tasks needing both exploration and modification. |
+
+Don't redefine these. Layer your specialist personas (code-reviewer, security-auditor, test-engineer) on top of them.
+
+### Frontmatter restrictions for plugin agents
+
+Plugin subagents do **not** support the `hooks`, `mcpServers`, or `permissionMode` frontmatter fields — these are silently ignored. If a future persona needs any of those, the user must copy the file into `.claude/agents/` or `~/.claude/agents/` instead.
+
+The fields that DO work in plugin agents are: `name`, `description`, `tools`, `disallowedTools`, `model`, `maxTurns`, `skills`, `memory`, `background`, `effort`, `isolation`, `color`, `initialPrompt`. Use `model` per-persona if you want to optimize cost (e.g. Haiku for `test-engineer` coverage scans, Sonnet for `code-reviewer`, Opus for `security-auditor`).
+
+### Spawning multiple subagents in parallel
+
+In Claude Code, parallel fan-out (Pattern 3) requires issuing **multiple Agent tool calls in a single assistant turn**. Sequential turns serialize execution. `/ship` calls this out explicitly. Any new orchestrator command should do the same.
+
+---
+
+## Worked example: Agent Teams for competing-hypothesis debugging
+
+This example shows when to reach for **Agent Teams** instead of `/ship`'s subagent fan-out. The two patterns look similar from a distance — both spawn the same three personas — but the value comes from a different place.
+
+### The scenario
+
+> *Checkout occasionally hangs for ~30 seconds before completing. It happens roughly once every 50 sessions. No errors in logs. Started after last week's release.*
+
+Plausible root causes (mutually exclusive, all fit the symptoms):
+
+1. A race condition in the new payment-confirmation flow
+2. An auth check that occasionally falls through to a slow synchronous network call
+3. A missing index on a query that scales with cart size
+4. A flaky third-party API where the SDK retries silently before timing out
+
+A single agent will pick the first plausible theory and stop investigating. A `/ship`-style subagent fan-out would have each persona report independently — but their reports never meet, so nothing rules out the wrong theories.
+
+This is exactly the case the Agent Teams docs describe: *"With multiple independent investigators actively trying to disprove each other, the theory that survives is much more likely to be the actual root cause."*
+
+### Why this is *not* a `/ship` job
+
+| | `/ship` (subagents) | Agent Teams |
+|--|--------------------|-------------|
+| Sub-agents see | The same diff, different lenses | A shared task list, each other's messages |
+| Output | Three independent reports → one merge | Adversarial debate → consensus root cause |
+| Right when | You want a verdict on a known artifact | You want to *find* the artifact among hypotheses |
+
+`/ship` is a verdict; Agent Teams is an investigation.
+
+### Setup (one-time, per-environment)
+
+Agent Teams is experimental. In `~/.claude/settings.json`:
+
+```json
+{
+  "env": {
+    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
+  }
+}
+```
+
+Requires Claude Code v2.1.32 or later. The personas in this repo are picked up automatically — no team-config files to author by hand.
+
+### The trigger prompt
+
+Type into the lead session, in natural language:
+
+```
+Users report checkout hangs for ~30 seconds intermittently after last
+week's release. No errors in logs.
+
+Create an agent team to debug this with competing hypotheses. Spawn
+three teammates using the existing agent types:
+
+  - code-reviewer  — investigate race conditions and blocking calls
+                     in the checkout code path
+  - security-auditor — investigate auth checks, session handling,
+                       and any synchronous network calls added recently
+  - test-engineer  — propose tests that would distinguish between the
+                     hypotheses and check coverage gaps in checkout
+
+Have them message each other directly to challenge each other's
+theories. Update findings as consensus emerges. Only converge when
+two teammates agree they can disprove the others'.
+```
+
+The lead spawns three teammates referencing the existing persona names. The persona body is **appended** to each teammate's system prompt as additional instructions (on top of the team-coordination instructions the lead installs); the trigger prompt above becomes their task.
+
+### What happens
+
+1. Each teammate runs in its own context window, exploring the codebase from its own lens.
+2. Teammates use `message` to send findings to each other directly. The lead doesn't have to relay.
+3. The shared task list shows who's investigating what — visible at any time with `Ctrl+T` (in-process mode) or in a tmux pane (split mode).
+4. When `code-reviewer` finds a `Promise.all` that should be sequential, it messages `security-auditor` to confirm the auth call isn't part of the race. `security-auditor` checks and replies — either confirming the race is the real issue or producing counter-evidence.
+5. `test-engineer` proposes a focused integration test for whichever theory is winning, which the team uses to verify before declaring consensus.
+6. The lead synthesizes the converged finding and presents it to you.
+
+You can interrupt at any teammate by cycling with `Shift+Down` and typing — useful for redirecting an investigator who's gone down a wrong path.
+
+### When to clean up
+
+When the investigation lands on a root cause, tell the lead:
+
+```
+Clean up the team
+```
+
+Always cleanup through the lead, not a teammate (per the docs: teammates lack full team context for cleanup).
+
+### Cost expectation
+
+Three Sonnet teammates running for ~10–15 minutes of investigation costs noticeably more than the same three personas spawned as subagents by `/ship`. The justification is *quality of conclusion* — for production debugging where the wrong fix is expensive, the extra tokens are a bargain. For a routine PR review, stick with `/ship`.
+
+### Anti-pattern in this scenario
+
+Do **not** rebuild this as a `/debug` slash command that fans out subagents. Subagents can't message each other — you'd lose the adversarial debate that makes the pattern work. If a workflow keeps coming up, document the trigger prompt above as a snippet rather than wrapping it in a slash command that misuses subagents.
+
+### When *not* to use Agent Teams
+
+- Production-bound verdict on a known diff → use `/ship` (subagents).
+- One specialist perspective on one artifact → direct persona invocation.
+- Sequential lifecycle (spec → plan → build) → user-driven slash commands (Pattern 4).
+- Read-heavy research with a small digest → built-in `Explore` subagent.
+
+Reach for Agent Teams only when teammates **need** to challenge each other to produce the right answer.
+
+---
+
+## Anti-patterns
+
+### A. Router persona ("meta-orchestrator")
+
+A persona whose job is to decide which other persona to call.
+
+```
+/work → router-persona → "this needs a review" → code-reviewer → router (paraphrases) → user
+```
+
+**Why it fails:**
+- Pure routing layer with no domain value
+- Adds two paraphrasing hops → information loss + roughly 2× token cost
+- The user already knew they wanted a review; they could have called `/review` directly
+- Replicates the work that slash commands and intent mapping in `AGENTS.md` already do
+
+**What to do instead:** add or refine slash commands. Document intent → command mapping in `AGENTS.md`.
+
+---
+
+### B. Persona that calls another persona
+
+A `code-reviewer` that internally invokes `security-auditor` when it sees auth code.
+
+**Why it fails:**
+- Personas were designed to produce a single perspective; chaining them defeats that
+- The summary the calling persona passes loses context the called persona needs
+- Failure modes multiply (which persona's output format wins? whose rules apply?)
+- Hides cost from the user
+
+**What to do instead:** have the calling persona *recommend* a follow-up audit in its report. The user or a slash command runs the second pass.
+
+---
+
+### C. Sequential orchestrator that paraphrases
+
+An agent that calls `/spec`, then `/plan`, then `/build`, etc. on the user's behalf.
+
+**Why it fails:**
+- Loses the human checkpoints that catch wrong-direction work
+- Each hand-off summarizes context — accumulated drift over a long pipeline
+- Doubles token cost: orchestrator turn + sub-agent turn for every step
+- Removes user agency at exactly the points where judgment matters most
+
+**What to do instead:** keep the user as the orchestrator. Document the recommended sequence in `README.md` and let users invoke it.
+
+---
+
+### D. Deep persona trees
+
+`/ship` calls a `pre-ship-coordinator` that calls a `quality-coordinator` that calls `code-reviewer`.
+
+**Why it fails:**
+- Each layer adds latency and tokens with no decision value
+- Debugging becomes a multi-level investigation
+- The leaf personas lose context to multiple summarization steps
+
+**What to do instead:** keep the orchestration depth at most 1 (slash command → personas). The merge happens in the main agent.
+
+---
+
+## Decision flow
+
+When considering a new orchestrated workflow, walk this flow:
+
+```
+Is the work one perspective on one artifact?
+├── Yes → Direct invocation. Stop.
+└── No  → Will the same composition repeat?
+         ├── No  → Direct invocation, ad hoc. Stop.
+         └── Yes → Are sub-tasks independent?
+                  ├── No  → Sequential slash commands run by user (Pattern 4).
+                  └── Yes → Parallel fan-out with merge (Pattern 3).
+                           Validate against the checklist above.
+                           If any check fails → fall back to single-persona command (Pattern 2).
+```
+
+---
+
+## When to add a new pattern to this catalog
+
+Add a new entry only after:
+
+1. You've used the pattern at least twice in real work
+2. You can name a concrete artifact in this repo that demonstrates it
+3. You can explain why an existing pattern wouldn't have worked
+4. You can describe its anti-pattern shadow (what people will mistakenly build instead)
+
+Premature catalog entries become aspirational documentation that no one follows.
--- a/.claude/skills/shipping-and-launch/references/performance-checklist.md
+++ b/.claude/skills/shipping-and-launch/references/performance-checklist.md
@@ -0,0 +1,153 @@
+# Performance Checklist
+
+Quick reference checklist for web application performance. Use alongside the `performance-optimization` skill.
+
+## Table of Contents
+
+- [Core Web Vitals Targets](#core-web-vitals-targets)
+- [TTFB Diagnosis](#ttfb-diagnosis)
+- [Frontend Checklist](#frontend-checklist)
+- [Backend Checklist](#backend-checklist)
+- [Measurement Commands](#measurement-commands)
+- [Common Anti-Patterns](#common-anti-patterns)
+
+## Core Web Vitals Targets
+
+| Metric | Good | Needs Work | Poor |
+|--------|------|------------|------|
+| LCP (Largest Contentful Paint) | ≤ 2.5s | ≤ 4.0s | > 4.0s |
+| INP (Interaction to Next Paint) | ≤ 200ms | ≤ 500ms | > 500ms |
+| CLS (Cumulative Layout Shift) | ≤ 0.1 | ≤ 0.25 | > 0.25 |
+
+## TTFB Diagnosis
+
+When TTFB is slow (> 800ms), check each component in DevTools Network waterfall:
+
+- [ ] **DNS resolution** slow → add `<link rel="dns-prefetch">` or `<link rel="preconnect">` for known origins
+- [ ] **TCP/TLS handshake** slow → enable HTTP/2, consider edge deployment, verify keep-alive
+- [ ] **Server processing** slow → profile backend, check slow queries, add caching
+
+## Frontend Checklist
+
+### Images
+- [ ] Images use modern formats (WebP, AVIF)
+- [ ] Images are responsively sized (`srcset` and `sizes`)
+- [ ] Images and `<source>` elements have explicit `width` and `height` (prevents CLS in art direction)
+- [ ] Below-the-fold images use `loading="lazy"` and `decoding="async"`
+- [ ] Hero/LCP images use `fetchpriority="high"` and no lazy loading
+
+### JavaScript
+- [ ] Bundle size under 200KB gzipped (initial load)
+- [ ] Code splitting with dynamic `import()` for routes and heavy features
+- [ ] Tree shaking enabled (verify dependency ships ESM and marks `sideEffects: false`)
+- [ ] No blocking JavaScript in `<head>` (use `defer` or `async`)
+- [ ] Heavy computation offloaded to Web Workers (if applicable)
+- [ ] `React.memo()` on expensive components that re-render with same props
+- [ ] `useMemo()` / `useCallback()` only where profiling shows benefit
+- [ ] Long tasks (> 50ms) broken up to keep the main thread available — main lever for INP
+- [ ] `yieldToMain` pattern used inside long-running loops so input events can run between chunks
+- [ ] Modern scheduling APIs used where available: `scheduler.yield()` (preferred), `scheduler.postTask()` with priorities, `isInputPending()` to yield only when needed
+- [ ] `requestIdleCallback` for deferrable, non-urgent work (analytics flush, prefetch, warmup)
+- [ ] Non-critical work deferred out of event handlers (e.g. analytics, logging) so the response to the interaction is not delayed
+- [ ] Third-party scripts loaded with `async` / `defer`, audited for size, and fronted by a facade when heavy (chat widgets, embeds)
+
+### CSS
+- [ ] Critical CSS inlined or preloaded
+- [ ] No render-blocking CSS for non-critical styles
+- [ ] No CSS-in-JS runtime cost in production (use extraction)
+
+### Fonts
+- [ ] Limited to 2–3 font families, 2–3 weights each (every additional weight is another request)
+- [ ] WOFF2 format only (smallest, universal support — skip WOFF/TTF/EOT)
+- [ ] Self-hosted when possible (third-party font CDNs add DNS + TCP + TLS round-trips)
+- [ ] LCP-critical fonts preloaded: `<link rel="preload" as="font" type="font/woff2" crossorigin>`
+- [ ] `font-display: swap` (or `optional` for non-critical) to avoid FOIT blocking render
+- [ ] Subsetted via `unicode-range` to ship only the glyphs each page needs
+- [ ] Variable fonts considered when multiple weights/styles are required (one file replaces many)
+- [ ] Fallback font metrics adjusted with `size-adjust`, `ascent-override`, `descent-override` to reduce CLS on font swap
+- [ ] System font stack considered before any custom font
+
+### Network
+- [ ] Static assets cached with long `max-age` + content hashing
+- [ ] API responses cached where appropriate (`Cache-Control`)
+- [ ] HTTP/2 or HTTP/3 enabled
+- [ ] Resources preconnected (`<link rel="preconnect">`) for known origins
+- [ ] `fetchpriority` used on critical non-image resources (e.g., key `<link rel="preload">`, above-the-fold `<script>`) — not only on `<img>`
+- [ ] No unnecessary redirects
+
+### Rendering
+- [ ] No layout thrashing (forced synchronous layouts)
+- [ ] Animations use `transform` and `opacity` (GPU-accelerated)
+- [ ] Long lists use virtualization (e.g., `react-window`)
+- [ ] No unnecessary full-page re-renders
+- [ ] Off-screen sections use `content-visibility: auto` with `contain-intrinsic-size` to skip layout/paint of non-visible areas
+- [ ] No `unload` event handlers and no `Cache-Control: no-store` on HTML responses — preserves back/forward cache (bfcache) eligibility
+
+## Backend Checklist
+
+### Database
+- [ ] No N+1 query patterns (use eager loading / joins)
+- [ ] Queries have appropriate indexes
+- [ ] List endpoints paginated (never `SELECT * FROM table`)
+- [ ] Connection pooling configured
+- [ ] Slow query logging enabled
+
+### API
+- [ ] Response times < 200ms (p95)
+- [ ] No synchronous heavy computation in request handlers
+- [ ] Bulk operations instead of loops of individual calls
+- [ ] Response compression (gzip/brotli)
+- [ ] Appropriate caching (in-memory, Redis, CDN)
+
+### Infrastructure
+- [ ] CDN for static assets
+- [ ] Server located close to users (or edge deployment)
+- [ ] Horizontal scaling configured (if needed)
+- [ ] Health check endpoint for load balancer
+
+## Measurement Commands
+
+### INP field data and DevTools workflow
+
+1. **Field data first** — check [CrUX Vis](https://developer.chrome.com/docs/crux/vis) or your RUM tool for real-user INP before optimising
+2. **Identify slow interactions** — open DevTools → Performance panel → record while interacting; look for long tasks triggered by clicks/keystrokes
+3. **Test on mid-range Android** — INP issues often only surface on slower hardware; use a real device or DevTools CPU throttling (4×–6× slowdown)
+
+```bash
+# Lighthouse CLI
+npx lighthouse https://localhost:3000 --output json --output-path ./report.json
+
+# Bundle analysis
+npx webpack-bundle-analyzer stats.json
+# or for Vite:
+npx vite-bundle-visualizer
+
+# Check bundle size
+npx bundlesize
+
+# Web Vitals in code
+import { onLCP, onINP, onCLS } from 'web-vitals';
+onLCP(console.log);
+onINP(console.log);
+onCLS(console.log);
+
+# INP with interaction-level detail (attribution build)
+import { onINP } from 'web-vitals/attribution';
+onINP(({ value, attribution }) => {
+  const { interactionTarget, inputDelay, processingDuration, presentationDelay } = attribution;
+  console.log({ value, interactionTarget, inputDelay, processingDuration, presentationDelay });
+});
+```
+
+## Common Anti-Patterns
+
+| Anti-Pattern | Impact | Fix |
+|---|---|---|
+| N+1 queries | Linear DB load growth | Use joins, includes, or batch loading |
+| Unbounded queries | Memory exhaustion, timeouts | Always paginate, add LIMIT |
+| Missing indexes | Slow reads as data grows | Add indexes for filtered/sorted columns |
+| Layout thrashing | Jank, dropped frames | Batch DOM reads, then batch writes |
+| Unoptimized images | Slow LCP, wasted bandwidth | Use WebP, responsive sizes, lazy load |
+| Large bundles | Slow Time to Interactive | Code split, tree shake, audit deps |
+| Blocking main thread | Poor INP, unresponsive UI | Chunk long tasks with `scheduler.yield()` / `yieldToMain`, offload to Web Workers |
+| Memory leaks | Growing memory, eventual crash | Clean up listeners, intervals, refs |
--- a/.claude/skills/shipping-and-launch/references/security-checklist.md
+++ b/.claude/skills/shipping-and-launch/references/security-checklist.md
@@ -0,0 +1,134 @@
+# Security Checklist
+
+Quick reference for web application security. Use alongside the `security-and-hardening` skill.
+
+## Table of Contents
+
+- [Pre-Commit Checks](#pre-commit-checks)
+- [Authentication](#authentication)
+- [Authorization](#authorization)
+- [Input Validation](#input-validation)
+- [Security Headers](#security-headers)
+- [CORS Configuration](#cors-configuration)
+- [Data Protection](#data-protection)
+- [Dependency Security](#dependency-security)
+- [Error Handling](#error-handling)
+- [OWASP Top 10 Quick Reference](#owasp-top-10-quick-reference)
+
+## Pre-Commit Checks
+
+- [ ] No secrets in code (`git diff --cached | grep -i "password\|secret\|api_key\|token"`)
+- [ ] `.gitignore` covers: `.env`, `.env.local`, `*.pem`, `*.key`
+- [ ] `.env.example` uses placeholder values (not real secrets)
+
+## Authentication
+
+- [ ] Passwords hashed with bcrypt (≥12 rounds), scrypt, or argon2
+- [ ] Session cookies: `httpOnly`, `secure`, `sameSite: 'lax'`
+- [ ] Session expiration configured (reasonable max-age)
+- [ ] Rate limiting on login endpoint (≤10 attempts per 15 minutes)
+- [ ] Password reset tokens: time-limited (≤1 hour), single-use
+- [ ] Account lockout after repeated failures (optional, with notification)
+- [ ] MFA supported for sensitive operations (optional but recommended)
+
+## Authorization
+
+- [ ] Every protected endpoint checks authentication
+- [ ] Every resource access checks ownership/role (prevents IDOR)
+- [ ] Admin endpoints require admin role verification
+- [ ] API keys scoped to minimum necessary permissions
+- [ ] JWT tokens validated (signature, expiration, issuer)
+
+## Input Validation
+
+- [ ] All user input validated at system boundaries (API routes, form handlers)
+- [ ] Validation uses allowlists (not denylists)
+- [ ] String lengths constrained (min/max)
+- [ ] Numeric ranges validated
+- [ ] Email, URL, and date formats validated with proper libraries
+- [ ] File uploads: type restricted, size limited, content verified
+- [ ] SQL queries parameterized (no string concatenation)
+- [ ] HTML output encoded (use framework auto-escaping)
+- [ ] URLs validated before redirect (prevent open redirect)
+
+## Security Headers
+
+```
+Content-Security-Policy: default-src 'self'; script-src 'self'
+Strict-Transport-Security: max-age=31536000; includeSubDomains
+X-Content-Type-Options: nosniff
+X-Frame-Options: DENY
+X-XSS-Protection: 0  (disabled, rely on CSP)
+Referrer-Policy: strict-origin-when-cross-origin
+Permissions-Policy: camera=(), microphone=(), geolocation=()
+```
+
+## CORS Configuration
+
+```typescript
+// Restrictive (recommended)
+cors({
+  origin: ['https://yourdomain.com', 'https://app.yourdomain.com'],
+  credentials: true,
+  methods: ['GET', 'POST', 'PUT', 'PATCH', 'DELETE'],
+  allowedHeaders: ['Content-Type', 'Authorization'],
+})
+
+// NEVER use in production:
+cors({ origin: '*' })  // Allows any origin
+```
+
+## Data Protection
+
+- [ ] Sensitive fields excluded from API responses (`passwordHash`, `resetToken`, etc.)
+- [ ] Sensitive data not logged (passwords, tokens, full CC numbers)
+- [ ] PII encrypted at rest (if required by regulation)
+- [ ] HTTPS for all external communication
+- [ ] Database backups encrypted
+
+## Dependency Security
+
+```bash
+# Audit dependencies
+npm audit
+
+# Fix automatically where possible
+npm audit fix
+
+# Check for critical vulnerabilities
+npm audit --audit-level=critical
+
+# Keep dependencies updated
+npx npm-check-updates
+```
+
+## Error Handling
+
+```typescript
+// Production: generic error, no internals
+res.status(500).json({
+  error: { code: 'INTERNAL_ERROR', message: 'Something went wrong' }
+});
+
+// NEVER in production:
+res.status(500).json({
+  error: err.message,
+  stack: err.stack,         // Exposes internals
+  query: err.sql,           // Exposes database details
+});
+```
+
+## OWASP Top 10 Quick Reference
+
+| # | Vulnerability | Prevention |
+|---|---|---|
+| 1 | Broken Access Control | Auth checks on every endpoint, ownership verification |
+| 2 | Cryptographic Failures | HTTPS, strong hashing, no secrets in code |
+| 3 | Injection | Parameterized queries, input validation |
+| 4 | Insecure Design | Threat modeling, spec-driven development |
+| 5 | Security Misconfiguration | Security headers, minimal permissions, audit deps |
+| 6 | Vulnerable Components | `npm audit`, keep deps updated, minimal deps |
+| 7 | Auth Failures | Strong passwords, rate limiting, session management |
+| 8 | Data Integrity Failures | Verify updates/dependencies, signed artifacts |
+| 9 | Logging Failures | Log security events, don't log secrets |
+| 10 | SSRF | Validate/allowlist URLs, restrict outbound requests |
--- a/.claude/skills/shipping-and-launch/references/testing-patterns.md
+++ b/.claude/skills/shipping-and-launch/references/testing-patterns.md
@@ -0,0 +1,236 @@
+# Testing Patterns Reference
+
+Quick reference for common testing patterns across the stack. Use alongside the `test-driven-development` skill.
+
+## Table of Contents
+
+- [Test Structure (Arrange-Act-Assert)](#test-structure-arrange-act-assert)
+- [Test Naming Conventions](#test-naming-conventions)
+- [Common Assertions](#common-assertions)
+- [Mocking Patterns](#mocking-patterns)
+- [React/Component Testing](#reactcomponent-testing)
+- [API / Integration Testing](#api--integration-testing)
+- [E2E Testing (Playwright)](#e2e-testing-playwright)
+- [Test Anti-Patterns](#test-anti-patterns)
+
+## Test Structure (Arrange-Act-Assert)
+
+```typescript
+it('describes expected behavior', () => {
+  // Arrange: Set up test data and preconditions
+  const input = { title: 'Test Task', priority: 'high' };
+
+  // Act: Perform the action being tested
+  const result = createTask(input);
+
+  // Assert: Verify the outcome
+  expect(result.title).toBe('Test Task');
+  expect(result.priority).toBe('high');
+  expect(result.status).toBe('pending');
+});
+```
+
+## Test Naming Conventions
+
+```typescript
+// Pattern: [unit] [expected behavior] [condition]
+describe('TaskService.createTask', () => {
+  it('creates a task with default pending status', () => {});
+  it('throws ValidationError when title is empty', () => {});
+  it('trims whitespace from title', () => {});
+  it('generates a unique ID for each task', () => {});
+});
+```
+
+## Common Assertions
+
+```typescript
+// Equality
+expect(result).toBe(expected);           // Strict equality (===)
+expect(result).toEqual(expected);        // Deep equality (objects/arrays)
+expect(result).toStrictEqual(expected);  // Deep equality + type matching
+
+// Truthiness
+expect(result).toBeTruthy();
+expect(result).toBeFalsy();
+expect(result).toBeNull();
+expect(result).toBeDefined();
+expect(result).toBeUndefined();
+
+// Numbers
+expect(result).toBeGreaterThan(5);
+expect(result).toBeLessThanOrEqual(10);
+expect(result).toBeCloseTo(0.3, 5);      // Floating point
+
+// Strings
+expect(result).toMatch(/pattern/);
+expect(result).toContain('substring');
+
+// Arrays / Objects
+expect(array).toContain(item);
+expect(array).toHaveLength(3);
+expect(object).toHaveProperty('key', 'value');
+
+// Errors
+expect(() => fn()).toThrow();
+expect(() => fn()).toThrow(ValidationError);
+expect(() => fn()).toThrow('specific message');
+
+// Async
+await expect(asyncFn()).resolves.toBe(value);
+await expect(asyncFn()).rejects.toThrow(Error);
+```
+
+## Mocking Patterns
+
+### Mock Functions
+
+```typescript
+const mockFn = jest.fn();
+mockFn.mockReturnValue(42);
+mockFn.mockResolvedValue({ data: 'test' });
+mockFn.mockImplementation((x) => x * 2);
+
+expect(mockFn).toHaveBeenCalled();
+expect(mockFn).toHaveBeenCalledWith('arg1', 'arg2');
+expect(mockFn).toHaveBeenCalledTimes(3);
+```
+
+### Mock Modules
+
+```typescript
+// Mock an entire module
+jest.mock('./database', () => ({
+  query: jest.fn().mockResolvedValue([{ id: 1, title: 'Test' }]),
+}));
+
+// Mock specific exports
+jest.mock('./utils', () => ({
+  ...jest.requireActual('./utils'),
+  generateId: jest.fn().mockReturnValue('test-id'),
+}));
+```
+
+### Mock at Boundaries Only
+
+```
+Mock these:                    Don't mock these:
+├── Database calls             ├── Internal utility functions
+├── HTTP requests              ├── Business logic
+├── File system operations     ├── Data transformations
+├── External API calls         ├── Validation functions
+└── Time/Date (when needed)    └── Pure functions
+```
+
+## React/Component Testing
+
+```tsx
+import { render, screen, fireEvent, waitFor } from '@testing-library/react';
+
+describe('TaskForm', () => {
+  it('submits the form with entered data', async () => {
+    const onSubmit = jest.fn();
+    render(<TaskForm onSubmit={onSubmit} />);
+
+    // Find elements by accessible role/label (not test IDs)
+    await screen.findByRole('textbox', { name: /title/i });
+    fireEvent.change(screen.getByRole('textbox', { name: /title/i }), {
+      target: { value: 'New Task' },
+    });
+    fireEvent.click(screen.getByRole('button', { name: /create/i }));
+
+    await waitFor(() => {
+      expect(onSubmit).toHaveBeenCalledWith({ title: 'New Task' });
+    });
+  });
+
+  it('shows validation error for empty title', async () => {
+    render(<TaskForm onSubmit={jest.fn()} />);
+
+    fireEvent.click(screen.getByRole('button', { name: /create/i }));
+
+    expect(await screen.findByText(/title is required/i)).toBeInTheDocument();
+  });
+});
+```
+
+## API / Integration Testing
+
+```typescript
+import request from 'supertest';
+import { app } from '../src/app';
+
+describe('POST /api/tasks', () => {
+  it('creates a task and returns 201', async () => {
+    const response = await request(app)
+      .post('/api/tasks')
+      .send({ title: 'Test Task' })
+      .set('Authorization', `Bearer ${testToken}`)
+      .expect(201);
+
+    expect(response.body).toMatchObject({
+      id: expect.any(String),
+      title: 'Test Task',
+      status: 'pending',
+    });
+  });
+
+  it('returns 422 for invalid input', async () => {
+    const response = await request(app)
+      .post('/api/tasks')
+      .send({ title: '' })
+      .set('Authorization', `Bearer ${testToken}`)
+      .expect(422);
+
+    expect(response.body.error.code).toBe('VALIDATION_ERROR');
+  });
+
+  it('returns 401 without authentication', async () => {
+    await request(app)
+      .post('/api/tasks')
+      .send({ title: 'Test' })
+      .expect(401);
+  });
+});
+```
+
+## E2E Testing (Playwright)
+
+```typescript
+import { test, expect } from '@playwright/test';
+
+test('user can create and complete a task', async ({ page }) => {
+  // Navigate and authenticate
+  await page.goto('/');
+  await page.fill('[name="email"]', 'test@example.com');
+  await page.fill('[name="password"]', 'testpass123');
+  await page.click('button:has-text("Log in")');
+
+  // Create a task
+  await page.click('button:has-text("New Task")');
+  await page.fill('[name="title"]', 'Buy groceries');
+  await page.click('button:has-text("Create")');
+
+  // Verify task appears
+  await expect(page.locator('text=Buy groceries')).toBeVisible();
+
+  // Complete the task
+  await page.click('[aria-label="Complete Buy groceries"]');
+  await expect(page.locator('text=Buy groceries')).toHaveCSS(
+    'text-decoration-line', 'line-through'
+  );
+});
+```
+
+## Test Anti-Patterns
+
+| Anti-Pattern | Problem | Better Approach |
+|---|---|---|
+| Testing implementation details | Breaks on refactor | Test inputs/outputs |
+| Snapshot everything | No one reviews snapshot diffs | Assert specific values |
+| Shared mutable state | Tests pollute each other | Setup/teardown per test |
+| Testing third-party code | Wastes time, not your bug | Mock the boundary |
+| Skipping tests to pass CI | Hides real bugs | Fix or delete the test |
+| Using `test.skip` permanently | Dead code | Remove or fix it |
+| Overly broad assertions | Doesn't catch regressions | Be specific |
+| No async error handling | Swallowed errors, false passes | Always `await` async tests |