Initial commit
This commit is contained in:
243
.claude/skills/doubt-driven-development/SKILL.md
Normal file
243
.claude/skills/doubt-driven-development/SKILL.md
Normal file
@@ -0,0 +1,243 @@
|
||||
---
|
||||
name: doubt-driven-development
|
||||
description: Subjects every non-trivial decision to a fresh-context adversarial review before it stands. Use when correctness matters more than speed, when working in unfamiliar code, when stakes are high (production, security-sensitive logic, irreversible operations), or any time a confident output would be cheaper to verify now than to debug later.
|
||||
---
|
||||
|
||||
# Doubt-Driven Development
|
||||
|
||||
## Overview
|
||||
|
||||
A confident answer is not a correct one. Long sessions accumulate context that quietly turns assumptions into "facts" without anyone noticing. Doubt-driven development is the discipline of materializing a fresh-context reviewer — biased to **disprove**, not approve — before any non-trivial output stands.
|
||||
|
||||
This is not `/review`. `/review` is a verdict on a finished artifact. This is an in-flight posture: non-trivial decisions get cross-examined while course-correction is still cheap.
|
||||
|
||||
## When to Use
|
||||
|
||||
A decision is **non-trivial** when at least one of these is true:
|
||||
|
||||
- It introduces or modifies branching logic
|
||||
- It crosses a module or service boundary
|
||||
- It asserts a property the type system or compiler cannot verify (thread safety, idempotence, ordering, invariants)
|
||||
- Its correctness depends on context the future reader cannot see
|
||||
- Its blast radius is irreversible (production deploy, data migration, public API change)
|
||||
|
||||
Apply the skill when:
|
||||
|
||||
- About to make an architectural decision under uncertainty
|
||||
- About to commit non-trivial code
|
||||
- About to claim a non-obvious fact ("this is safe", "this scales", "this matches the spec")
|
||||
- Working in code you don't fully understand
|
||||
|
||||
**When NOT to use:**
|
||||
|
||||
- Mechanical operations (renaming, formatting, file moves)
|
||||
- Following a clear, unambiguous user instruction
|
||||
- Reading or summarizing existing code
|
||||
- One-line changes with obvious correctness
|
||||
- Pure tooling operations (running tests, listing files)
|
||||
- The user has explicitly asked for speed over verification
|
||||
|
||||
If you doubt every keystroke, you ship nothing. The skill applies only to non-trivial decisions as defined above.
|
||||
|
||||
## Loading Constraints
|
||||
|
||||
This skill is designed for the **main-session orchestrator**, where Step 3 (DOUBT, detailed below) can spawn a fresh-context reviewer.
|
||||
|
||||
- **Do NOT add this skill to a persona's `skills:` frontmatter.** A persona that follows Step 3 would spawn another persona — the orchestration anti-pattern explicitly forbidden by `references/orchestration-patterns.md` ("personas do not invoke other personas").
|
||||
- **If you find yourself applying this skill from inside a subagent context** (where Claude Code prevents nested subagent spawn): the preferred path is to surface to the user that doubt-driven cannot run nested and let the main session handle it. As a last resort only, a degraded self-questioning fallback exists — rewrite ARTIFACT + CONTRACT as a fresh self-prompt with a hard mental separator from your prior reasoning, and walk Steps 1–5. This is **not fresh-context review** (you carry your own context with you), so flag the result as degraded and prefer escalation whenever the user is reachable.
|
||||
|
||||
## The Process
|
||||
|
||||
Copy this checklist when applying the skill:
|
||||
|
||||
```
|
||||
Doubt cycle:
|
||||
- [ ] Step 1: CLAIM — wrote the claim + why-it-matters
|
||||
- [ ] Step 2: EXTRACT — isolated artifact + contract, stripped reasoning
|
||||
- [ ] Step 3: DOUBT — invoked fresh-context reviewer with adversarial prompt
|
||||
- [ ] Step 4: RECONCILE — classified every finding against the artifact text
|
||||
- [ ] Step 5: STOP — met stop condition (trivial findings, 3 cycles, or user override)
|
||||
```
|
||||
|
||||
### Step 1: CLAIM — Surface what stands
|
||||
|
||||
Name the decision in two or three lines:
|
||||
|
||||
```
|
||||
CLAIM: "The new caching layer is thread-safe under the
|
||||
read-heavy workload described in the spec."
|
||||
WHY THIS MATTERS: a race here corrupts user data and is
|
||||
hard to detect in QA.
|
||||
```
|
||||
|
||||
If you can't write the claim that compactly, you have a vibe, not a decision. Surface it before scrutinizing it.
|
||||
|
||||
### Step 2: EXTRACT — Smallest reviewable unit
|
||||
|
||||
A fresh-context reviewer needs the **artifact** and the **contract**, not the journey.
|
||||
|
||||
- Code: the diff or the function — not the whole file
|
||||
- Decision: the proposal in 3–5 sentences plus the constraints it has to satisfy
|
||||
- Assertion: the claim plus the evidence that supposedly supports it (kept distinct from the Step 1 CLAIM block, which is the orchestrator's hypothesis under scrutiny)
|
||||
|
||||
Strip your reasoning. If you hand over conclusions, you'll get back validation of your conclusions. The unit must be small enough that a reviewer can hold it in mind in one read — if it's a 500-line PR, decompose first.
|
||||
|
||||
### Step 3: DOUBT — Invoke the fresh-context reviewer
|
||||
|
||||
The reviewer's prompt **must be adversarial**. Framing decides the answer.
|
||||
|
||||
```
|
||||
Adversarial review. Find what is wrong with this artifact.
|
||||
Assume the author is overconfident. Look for:
|
||||
- Unstated assumptions
|
||||
- Edge cases not handled
|
||||
- Hidden coupling or shared state
|
||||
- Ways the contract could be violated
|
||||
- Existing conventions this might break
|
||||
- Failure modes under unexpected input
|
||||
|
||||
Do NOT validate. Do NOT summarize. Find issues, or state
|
||||
explicitly that you cannot find any after thorough examination.
|
||||
|
||||
ARTIFACT: <paste artifact>
|
||||
CONTRACT: <paste contract>
|
||||
```
|
||||
|
||||
**Pass ARTIFACT + CONTRACT only. Do NOT pass the CLAIM.** Handing the reviewer your conclusion biases it toward agreement. The reviewer must independently determine whether the artifact satisfies the contract.
|
||||
|
||||
In Claude Code, the role-based reviewers in `agents/` start with isolated context by design and are usable here — see `agents/` for the roster and per-domain match.
|
||||
|
||||
**The adversarial prompt above takes precedence over the persona's default response shape.** Personas like `code-reviewer` are written to produce balanced verdicts with both strengths and weaknesses; doubt-driven needs issues-only output. Paste the adversarial prompt verbatim into the invocation so it overrides the persona's default. If a persona's response shape can't be overridden cleanly, fall back to a generic subagent with the adversarial prompt.
|
||||
|
||||
#### Cross-model escalation
|
||||
|
||||
A single-model reviewer shares blind spots with the original author — a colder, different-architecture model catches them. Doubt-driven is already opt-in for non-trivial decisions, so within that scope offering cross-model is part of the skill's value, not optional friction.
|
||||
|
||||
**Interactive sessions: always offer. Never silently skip.**
|
||||
|
||||
**Step 1: Ask the user**
|
||||
|
||||
After the single-model review in Step 3 above, but before RECONCILE, pause and ask:
|
||||
|
||||
> *"Single-model review complete. Want a cross-model second opinion? Options: Gemini CLI, Codex CLI, manual external review (you paste it elsewhere), or skip."*
|
||||
|
||||
This question is mandatory in every interactive doubt cycle — even on artifacts that feel low-stakes. The user — not the agent — decides whether the cost is worth it. The agent's job is to surface the choice.
|
||||
|
||||
**Step 2: If the user picks a CLI — verify, then invoke**
|
||||
|
||||
1. Check the tool is in PATH (`which gemini`, `which codex`).
|
||||
2. Test it works (`gemini --version` or equivalent) before passing the full prompt — a stale or broken binary may pass `which` but fail on real input.
|
||||
3. Confirm the exact invocation with the user, including required flags, auth, and env vars (e.g., API keys). Implementations vary; never assume.
|
||||
4. Pass ARTIFACT + CONTRACT + the adversarial prompt **only**. No session context, no CLAIM.
|
||||
5. Mind shell escaping. If the artifact contains quotes, `$(...)`, or backticks, prefer stdin (`echo … | gemini`) or a heredoc over inline `-p "…"`. When in doubt, ask the user to confirm the invocation before running it.
|
||||
6. Take the output into Step 4 (RECONCILE).
|
||||
|
||||
**Never interpolate the artifact into a shell-quoted argument.** Code, markdown, and review prompts routinely contain backticks, `$(...)`, and quote characters that will either truncate the prompt or execute embedded shell. Write the full prompt to a file and pipe it through stdin.
|
||||
|
||||
Example shapes (verify flags against your installed tool — syntax differs across implementations and versions):
|
||||
|
||||
```bash
|
||||
# Write the adversarial prompt + ARTIFACT + CONTRACT to a temp file first.
|
||||
# Then pipe via stdin so shell metacharacters in the artifact stay inert.
|
||||
|
||||
# Codex (read-only sandbox keeps the CLI from writing to your workspace):
|
||||
codex exec --sandbox read-only -C <repo-path> - < /tmp/doubt-prompt.md
|
||||
|
||||
# Gemini ('--approval-mode plan' is read-only; '-p ""' triggers non-interactive
|
||||
# mode and the prompt is read from stdin):
|
||||
gemini --approval-mode plan -p "" < /tmp/doubt-prompt.md
|
||||
```
|
||||
|
||||
A read-only sandbox is the load-bearing detail: a doubt artifact may itself contain instructions (intentional or accidental prompt injection) that the cross-model CLI would otherwise execute against your workspace.
|
||||
|
||||
**Step 3: If the CLI is unavailable or fails**
|
||||
|
||||
Surface the failure explicitly. Offer: run it manually, try a different tool, or skip. Do not silently fall back to single-model — the user should know cross-model didn't happen.
|
||||
|
||||
**Step 4: If the user skips**
|
||||
|
||||
Acknowledge the skip in the output (*"Proceeding with single-model findings only"*) and continue to RECONCILE. Skipping is fine; silent skipping is not.
|
||||
|
||||
**Non-interactive contexts** (CI, `/loop`, autonomous-loop, scheduled runs):
|
||||
|
||||
- Cross-model is **skipped**, and the skip must be **announced** in the output: *"Cross-model skipped: non-interactive context."*
|
||||
- **Never invoke an external CLI without explicit user authorization** — this is a load-bearing safety property.
|
||||
|
||||
Cross-model adds cost, latency, and tool fragility. The agent surfaces the choice every cycle; the user decides whether this artifact warrants it.
|
||||
|
||||
### Step 4: RECONCILE — Fold findings back
|
||||
|
||||
The reviewer's output is data, not verdict. **You are still the orchestrator.** Re-read the artifact text against each finding before classifying — rubber-stamping the reviewer is the same failure mode as ignoring it.
|
||||
|
||||
For each finding, classify in this **precedence order** (first matching class wins):
|
||||
|
||||
1. **Contract misread** — reviewer flagged something specifically because the CONTRACT you provided was unclear or incomplete. Fix the contract first, re-classify on the next cycle.
|
||||
2. **Valid + actionable** — real issue requiring a change to the artifact. Change it, re-loop.
|
||||
3. **Valid trade-off** — issue is real but cost of fixing exceeds cost of accepting. Document the trade-off explicitly so the user sees it.
|
||||
4. **Noise** — reviewer flagged something that's actually correct under context the reviewer didn't have. Note it, move on, and ask: would adding that context to the contract have prevented the false flag?
|
||||
|
||||
A fresh reviewer can be wrong because it lacks context. Don't defer just because it's "fresh."
|
||||
|
||||
### Step 5: STOP — Bounded loop, not recursion
|
||||
|
||||
Stop when:
|
||||
|
||||
- Next iteration returns only trivial or already-considered findings, **or**
|
||||
- 3 cycles completed (escalate to user, don't grind a fourth alone), **or**
|
||||
- User explicitly says "ship it"
|
||||
|
||||
If after 3 cycles the reviewer still surfaces substantive issues, the artifact may not be ready. Surface this to the user — three unresolved cycles is information about the artifact, not a reason to keep looping.
|
||||
|
||||
If 3 cycles is "obviously insufficient" because the artifact is large: the artifact is too big — return to Step 2 and decompose. Do not lift the bound.
|
||||
|
||||
## Common Rationalizations
|
||||
|
||||
| Rationalization | Reality |
|
||||
|---|---|
|
||||
| "I'm confident, skip the doubt step" | Confidence correlates poorly with correctness on novel problems. Moments of certainty are exactly when blind spots hide. |
|
||||
| "Spawning a reviewer is expensive" | Debugging a wrong commit in production is more expensive. The check is bounded; the bug isn't. |
|
||||
| "The reviewer will just nitpick" | Only if unscoped. Constrain the prompt to "issues that would make this fail under the contract." |
|
||||
| "I'll do doubt at the end with `/review`" | `/review` is a final gate. Doubt-driven catches wrong directions early when course-correction is cheap. By PR time it's too late. |
|
||||
| "If I doubt every step I'll never ship" | The skill applies to non-trivial decisions, not every keystroke. Re-read "When NOT to Use." |
|
||||
| "Two opinions are always better than one" | Not when the second has less context and produces noise. Reconcile, don't defer. |
|
||||
| "The reviewer disagreed so I was wrong" | The reviewer lacks your context — disagreement is information, not verdict. Re-read the artifact, classify, then decide. |
|
||||
| "Cross-model is always better" | Cross-model catches blind spots a single model shares with itself, but it adds cost and tool fragility. Offer it every interactive doubt cycle — the user decides whether the artifact warrants it. The agent's job is to surface the choice, not to gate it. |
|
||||
| "User said yes once, so I can keep invoking the CLI" | Each invocation is its own authorization. The artifact, the prompt, and the flags change between calls — re-confirm the exact command with the user before every run. |
|
||||
|
||||
## Red Flags
|
||||
|
||||
- Spawning a fresh-context reviewer for a one-line rename or formatting change
|
||||
- Treating reviewer output as authoritative without re-reading the artifact text
|
||||
- Looping >3 cycles without escalating to the user
|
||||
- Prompting the reviewer with "is this good?" instead of "find issues"
|
||||
- Skipping doubt under time pressure on a high-stakes decision
|
||||
- Re-spawning fresh-context on an unchanged artifact (you'll get the same findings; you're stalling)
|
||||
- **Doubt theater (checkable signal)**: across 2 or more cycles where the reviewer surfaced substantive findings, zero findings were classified as actionable. You are validating, not doubting. Stop and escalate.
|
||||
- Doubting only after committing — that's `/review`, not doubt-driven development
|
||||
- Hardcoding an external CLI invocation without confirming with the user that the tool exists, is configured, and accepts that exact syntax
|
||||
- **Silently skipping cross-model in an interactive doubt cycle.** Even when not recommending it, the offer must be visible. Skipping is fine; silent skipping is not.
|
||||
- Falling back silently when an external CLI errors or is missing — surface the failure and let the user redirect
|
||||
- Stripping the contract from the reviewer's input
|
||||
- Passing the CLAIM to the reviewer (biases toward agreement)
|
||||
|
||||
## Interaction with Other Skills
|
||||
|
||||
- **`code-review-and-quality` / `/review`**: complementary. `/review` is post-hoc PR verdict; doubt-driven is in-flight per-decision. Use both.
|
||||
- **`source-driven-development`**: SDD verifies *facts about frameworks* against official docs. Doubt-driven verifies *your reasoning about the artifact*. SDD checks the API exists; doubt-driven checks you used it correctly under the contract.
|
||||
- **`test-driven-development`**: TDD's RED step is doubt made concrete — a failing test is a disproof attempt. When TDD applies, that failing test *is* the doubt step for behavioral claims.
|
||||
- **`debugging-and-error-recovery`**: when the reviewer surfaces a real failure mode, drop into the debugging skill to localize and fix.
|
||||
- **Repo orchestration rules** (`references/orchestration-patterns.md`): this skill orchestrates from the main session. A persona calling another persona is anti-pattern B — see Loading Constraints above.
|
||||
|
||||
## Verification
|
||||
|
||||
After applying doubt-driven development:
|
||||
|
||||
- [ ] Every non-trivial decision (per the definition above) was named explicitly as a CLAIM before standing
|
||||
- [ ] At least one fresh-context review per non-trivial artifact (a failing test produced by TDD's RED step satisfies this for behavioral claims, per Interaction with Other Skills)
|
||||
- [ ] The reviewer received ARTIFACT + CONTRACT — NOT the CLAIM, NOT your reasoning
|
||||
- [ ] The reviewer's prompt was adversarial ("find issues"), not validating ("is it good")
|
||||
- [ ] Findings were classified against the artifact text (not rubber-stamped) using the precedence: contract misread / actionable / trade-off / noise
|
||||
- [ ] A stop condition was met (trivial findings, 3 cycles, or user override)
|
||||
- [ ] In interactive mode, cross-model was **explicitly offered** to the user (regardless of artifact stakes) and the response was acknowledged in the output
|
||||
- [ ] In non-interactive mode, cross-model was skipped and the skip was announced
|
||||
- [ ] Any external CLI invocation was preceded by a PATH check, a working-binary test, syntax confirmation with the user, and explicit authorization to run
|
||||
@@ -0,0 +1,160 @@
|
||||
# Accessibility Checklist
|
||||
|
||||
Quick reference for WCAG 2.1 AA compliance. Use alongside the `frontend-ui-engineering` skill.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Essential Checks](#essential-checks)
|
||||
- [Common HTML Patterns](#common-html-patterns)
|
||||
- [Testing Tools](#testing-tools)
|
||||
- [Quick Reference: ARIA Live Regions](#quick-reference-aria-live-regions)
|
||||
- [Common Anti-Patterns](#common-anti-patterns)
|
||||
|
||||
## Essential Checks
|
||||
|
||||
### Keyboard Navigation
|
||||
- [ ] All interactive elements focusable via Tab key
|
||||
- [ ] Focus order follows visual/logical order
|
||||
- [ ] Focus is visible (outline/ring on focused elements)
|
||||
- [ ] Custom widgets have keyboard support (Enter to activate, Escape to close)
|
||||
- [ ] No keyboard traps (user can always Tab away from a component)
|
||||
- [ ] Skip-to-content link at top of page - visible (at least) on keyboard focus
|
||||
- [ ] Modals trap focus while open, return focus on close
|
||||
|
||||
### Screen Readers
|
||||
- [ ] All images have `alt` text (or `alt=""` for decorative images)
|
||||
- [ ] All form inputs have associated labels (`<label>` or `aria-label`)
|
||||
- [ ] Buttons and links have descriptive text (not "Click here")
|
||||
- [ ] Icon-only buttons have `aria-label`
|
||||
- [ ] Page has one `<h1>` and headings don't skip levels
|
||||
- [ ] Dynamic content changes announced (`aria-live` regions)
|
||||
- [ ] Tables have `<th>` headers with scope
|
||||
|
||||
### Visual
|
||||
- [ ] Text contrast ≥ 4.5:1 (normal text) or ≥ 3:1 (large text, 18px+)
|
||||
- [ ] UI components contrast ≥ 3:1 against background
|
||||
- [ ] Color is not the only way to convey information
|
||||
- [ ] Text resizable to 200% without breaking layout
|
||||
- [ ] No content that flashes more than 3 times per second
|
||||
|
||||
### Forms
|
||||
- [ ] Every input has a visible label
|
||||
- [ ] Required fields indicated (not by color alone)
|
||||
- [ ] Error messages specific and associated with the field
|
||||
- [ ] Error state visible by more than color (icon, text, border)
|
||||
- [ ] Form submission errors summarized and focusable
|
||||
- [ ] Known fields use autocomplete (for example `type="email" autocomplete="email"`)
|
||||
|
||||
### Content
|
||||
- [ ] Language declared (`<html lang="en">`)
|
||||
- [ ] Page has a descriptive `<title>`
|
||||
- [ ] Links distinguish from surrounding text (not by color alone)
|
||||
- [ ] Touch targets ≥ 44x44px on mobile
|
||||
- [ ] Meaningful empty states (not blank screens)
|
||||
|
||||
## Common HTML Patterns
|
||||
|
||||
### Buttons vs. Links
|
||||
|
||||
```html
|
||||
<!-- Use <button> for actions -->
|
||||
<button onClick={handleDelete}>Delete Task</button>
|
||||
|
||||
<!-- Use <a> for navigation -->
|
||||
<a href="/tasks/123">View Task</a>
|
||||
|
||||
<!-- NEVER use div/span as buttons -->
|
||||
<div onClick={handleDelete}>Delete</div> <!-- BAD -->
|
||||
```
|
||||
|
||||
### Form Labels
|
||||
|
||||
```html
|
||||
<!-- Explicit label association -->
|
||||
<label htmlFor="email">Email address</label>
|
||||
<input id="email" type="email" required />
|
||||
|
||||
<!-- Implicit wrapping -->
|
||||
<label>
|
||||
Email address
|
||||
<input type="email" required />
|
||||
</label>
|
||||
|
||||
<!-- Hidden label (visible label preferred) -->
|
||||
<input type="search" aria-label="Search tasks" />
|
||||
```
|
||||
|
||||
### ARIA Roles
|
||||
|
||||
```html
|
||||
<!-- Navigation -->
|
||||
<nav aria-label="Main navigation">...</nav>
|
||||
<nav aria-label="Footer links">...</nav>
|
||||
|
||||
<!-- Status messages -->
|
||||
<div role="status" aria-live="polite">Task saved</div>
|
||||
|
||||
<!-- Alert messages -->
|
||||
<div role="alert">Error: Title is required</div>
|
||||
|
||||
<!-- Modal dialogs -->
|
||||
<dialog aria-modal="true" aria-labelledby="dialog-title">
|
||||
<h2 id="dialog-title">Confirm Delete</h2>
|
||||
...
|
||||
</dialog>
|
||||
|
||||
<!-- Loading states -->
|
||||
<div aria-busy="true" aria-label="Loading tasks">
|
||||
<Spinner />
|
||||
</div>
|
||||
```
|
||||
|
||||
### Accessible Lists
|
||||
|
||||
```html
|
||||
<ul role="list" aria-label="Tasks">
|
||||
<li>
|
||||
<input type="checkbox" id="task-1" aria-label="Complete: Buy groceries" />
|
||||
<label htmlFor="task-1">Buy groceries</label>
|
||||
</li>
|
||||
</ul>
|
||||
```
|
||||
|
||||
## Testing Tools
|
||||
|
||||
```bash
|
||||
# Automated audit
|
||||
npx axe-core # Programmatic accessibility testing
|
||||
npx pa11y # CLI accessibility checker
|
||||
|
||||
# In browser
|
||||
# Chrome DevTools → Lighthouse → Accessibility
|
||||
# Chrome DevTools → Elements → Accessibility tree
|
||||
|
||||
# Screen reader testing
|
||||
# macOS: VoiceOver (Cmd + F5)
|
||||
# Windows: NVDA (free) or JAWS
|
||||
# Linux: Orca
|
||||
```
|
||||
|
||||
## Quick Reference: ARIA Live Regions
|
||||
|
||||
| Value | Behavior | Use For |
|
||||
|-------|----------|---------|
|
||||
| `aria-live="polite"` | Announced at next pause | Status updates, saved confirmations |
|
||||
| `aria-live="assertive"` | Announced immediately | Errors, time-sensitive alerts |
|
||||
| `role="status"` | Same as `polite` | Status messages |
|
||||
| `role="alert"` | Same as `assertive` | Error messages |
|
||||
|
||||
## Common Anti-Patterns
|
||||
|
||||
| Anti-Pattern | Problem | Fix |
|
||||
|---|---|---|
|
||||
| `div` as button | Not focusable, no keyboard support | Use `<button>` |
|
||||
| Missing `alt` text | Images invisible to screen readers | Add descriptive `alt` |
|
||||
| Color-only states | Invisible to color-blind users | Add icons, text, or patterns |
|
||||
| Autoplaying media | Disorienting, can't be stopped | Add controls, don't autoplay |
|
||||
| Custom dropdown with no ARIA | Unusable by keyboard/screen reader | Use native `<select>` or proper ARIA listbox |
|
||||
| Removing focus outlines | Users can't see where they are | Style outlines, don't remove them |
|
||||
| Empty links/buttons | "Link" announced with no description | Add text or `aria-label` |
|
||||
| `tabindex > 0` | Breaks natural tab order | Use `tabindex="0"` or `-1` only |
|
||||
@@ -0,0 +1,370 @@
|
||||
# Orchestration Patterns
|
||||
|
||||
Reference catalog of agent orchestration patterns this repo endorses, plus anti-patterns to avoid. Read this before adding a new slash command that coordinates multiple personas, or before introducing a new persona that "wraps" existing ones.
|
||||
|
||||
The governing rule: **the user (or a slash command) is the orchestrator. Personas do not invoke other personas.** Skills are mandatory hops inside a persona's workflow.
|
||||
|
||||
---
|
||||
|
||||
## Endorsed patterns
|
||||
|
||||
### 1. Direct invocation (no orchestration)
|
||||
|
||||
Single persona, single perspective, single artifact. The default and the cheapest option.
|
||||
|
||||
```
|
||||
user → code-reviewer → report → user
|
||||
```
|
||||
|
||||
**Use when:** the work is one perspective on one artifact and you can describe it in one sentence.
|
||||
|
||||
**Examples:**
|
||||
- "Review this PR" → `code-reviewer`
|
||||
- "Find security issues in `auth.ts`" → `security-auditor`
|
||||
- "What tests are missing for the checkout flow?" → `test-engineer`
|
||||
|
||||
**Cost:** one round trip. The baseline you should always compare orchestrated patterns against.
|
||||
|
||||
---
|
||||
|
||||
### 2. Single-persona slash command
|
||||
|
||||
A slash command that wraps one persona with the project's skills. Saves the user from re-explaining the workflow every time.
|
||||
|
||||
```
|
||||
/review → code-reviewer (with code-review-and-quality skill) → report
|
||||
```
|
||||
|
||||
**Use when:** the same single-persona invocation happens repeatedly with the same setup.
|
||||
|
||||
**Examples in this repo:** `/review`, `/test`, `/code-simplify`.
|
||||
|
||||
**Cost:** same as direct invocation. The slash command is just a saved prompt.
|
||||
|
||||
**Anti-signal:** if the slash command's body is mostly "decide which persona to call," delete it and let the user call the persona directly.
|
||||
|
||||
---
|
||||
|
||||
### 3. Parallel fan-out with merge
|
||||
|
||||
Multiple personas operate on the same input concurrently, each producing an independent report. A merge step (in the main agent's context) synthesizes them into a single decision.
|
||||
|
||||
```
|
||||
┌─→ code-reviewer ─┐
|
||||
/ship → fan out ───┼─→ security-auditor ─┤→ merge → go/no-go + rollback
|
||||
└─→ test-engineer ─┘
|
||||
```
|
||||
|
||||
**Use when:**
|
||||
- The sub-tasks are genuinely independent (no shared mutable state, no ordering dependency)
|
||||
- Each sub-agent benefits from its own context window
|
||||
- The merge step is small enough to stay in the main context
|
||||
- Wall-clock latency matters
|
||||
|
||||
**Examples in this repo:** `/ship`.
|
||||
|
||||
**Cost:** N parallel sub-agent contexts + one merge turn. Higher than direct invocation, but faster wall-clock and produces better reports because each sub-agent stays focused on its single perspective.
|
||||
|
||||
**Validation checklist before adopting this pattern:**
|
||||
- [ ] Can I run all sub-agents at the same time without ordering issues?
|
||||
- [ ] Does each persona produce a different *kind* of finding, not just the same finding from a different angle?
|
||||
- [ ] Will the merge step fit in the main agent's remaining context?
|
||||
- [ ] Is the user's wait time long enough that parallelism is actually noticeable?
|
||||
|
||||
If any answer is "no," fall back to direct invocation or a single-persona command.
|
||||
|
||||
---
|
||||
|
||||
### 4. Sequential pipeline as user-driven slash commands
|
||||
|
||||
The user runs slash commands in a defined order, carrying context (or commit history) between them. There is no orchestrator agent — the user IS the orchestrator.
|
||||
|
||||
```
|
||||
user runs: /spec → /plan → /build → /test → /review → /ship
|
||||
```
|
||||
|
||||
**Use when:** the workflow has dependencies (each step needs the previous step's output) and human judgment between steps adds value.
|
||||
|
||||
**Examples in this repo:** the entire DEFINE → PLAN → BUILD → VERIFY → REVIEW → SHIP lifecycle.
|
||||
|
||||
**Cost:** one sub-agent context per step. Free for the orchestration layer because there is no orchestrator agent.
|
||||
|
||||
**Why not automate it:** an LLM "lifecycle orchestrator" would (a) lose nuance between steps because it has to summarize for hand-off, (b) skip the human checkpoints that catch wrong-direction work early, and (c) double the token cost via paraphrasing turns.
|
||||
|
||||
---
|
||||
|
||||
### 5. Research isolation (context preservation)
|
||||
|
||||
When a task requires reading large amounts of material that shouldn't pollute the main context, spawn a research sub-agent that returns only a digest.
|
||||
|
||||
```
|
||||
main agent → research sub-agent (reads 50 files) → digest → main agent continues
|
||||
```
|
||||
|
||||
**Use when:**
|
||||
- The main session needs to stay focused on a downstream task
|
||||
- The investigation result is much smaller than the input it consumes
|
||||
- The decision quality benefits from the main agent having room to think after
|
||||
|
||||
**Examples:** "Find every call site of this deprecated API across the monorepo," "Summarize what these 30 ADRs say about caching."
|
||||
|
||||
**Cost:** one isolated sub-agent context. Worth it any time the alternative is loading hundreds of files into the main context.
|
||||
|
||||
**On Claude Code, use the built-in `Explore` subagent** rather than defining a custom research persona. `Explore` runs on Haiku, is denied write/edit tools, and is purpose-built for this pattern. Define a custom research subagent only when `Explore` doesn't fit (e.g. you need a domain-specific system prompt the model wouldn't infer).
|
||||
|
||||
---
|
||||
|
||||
## Claude Code compatibility
|
||||
|
||||
This catalog is harness-agnostic, but most readers will run it on Claude Code. Here's how each pattern maps onto Claude Code's primitives — and where the platform enforces our rules for us.
|
||||
|
||||
### Where personas live
|
||||
|
||||
Plugin subagents go in `agents/` at the plugin root. This repo is a plugin (`.claude-plugin/plugin.json`), so `agents/code-reviewer.md`, `agents/security-auditor.md`, and `agents/test-engineer.md` are auto-discovered when the plugin is enabled. No path configuration needed.
|
||||
|
||||
### Subagents vs. Agent Teams
|
||||
|
||||
Claude Code has two parallelism primitives. Pattern 3 (parallel fan-out with merge) maps to **subagents**. If you need teammates that talk to each other, use **Agent Teams** instead.
|
||||
|
||||
| | Subagents | Agent Teams |
|
||||
|--|-----------|-------------|
|
||||
| Coordination | Main agent fans out, sub-agents only report back | Teammates message each other, share a task list |
|
||||
| Context | Own context window per subagent | Own context window per teammate |
|
||||
| When to use | Independent tasks producing reports | Collaborative work needing discussion |
|
||||
| Status | Stable | Experimental — requires `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` |
|
||||
| Cost | Lower | Higher — each teammate is a separate Claude instance |
|
||||
|
||||
**The personas in this repo work in both modes.** When spawned as subagents (e.g. by `/ship`), they report findings to the main session. When spawned as teammates (`Spawn a teammate using the security-auditor agent type…`), they can challenge each other's findings directly. The persona definition is the same; only the spawning context changes.
|
||||
|
||||
One subtlety: the `skills` and `mcpServers` frontmatter fields in a persona are honored when it runs as a subagent but **ignored when it runs as a teammate** — teammates load skills and MCP servers from your project and user settings, the same as a regular session. If a persona depends on a specific skill or MCP server being loaded, configure it at the session level so it's available in both modes.
|
||||
|
||||
### Platform-enforced rules
|
||||
|
||||
Two rules in this catalog aren't just convention — Claude Code enforces them:
|
||||
|
||||
- **"Subagents cannot spawn other subagents"** (verbatim from the docs). Anti-pattern B (persona-calls-persona) and Anti-pattern D (deep persona trees) cannot exist on Claude Code by construction.
|
||||
- **"No nested teams"** — teammates cannot spawn their own teams. Same anti-patterns blocked at the team level.
|
||||
|
||||
This means you can adopt the patterns in this catalog without worrying about contributors accidentally building the anti-patterns. They'll just fail to load.
|
||||
|
||||
### Built-in subagents to know about
|
||||
|
||||
Before defining a custom subagent, check whether one of these covers the role:
|
||||
|
||||
| Built-in | Purpose |
|
||||
|----------|---------|
|
||||
| `Explore` | Read-only codebase search and analysis. Use this for Pattern 5 (research isolation). |
|
||||
| `Plan` | Read-only research during plan mode. |
|
||||
| `general-purpose` | Multi-step tasks needing both exploration and modification. |
|
||||
|
||||
Don't redefine these. Layer your specialist personas (code-reviewer, security-auditor, test-engineer) on top of them.
|
||||
|
||||
### Frontmatter restrictions for plugin agents
|
||||
|
||||
Plugin subagents do **not** support the `hooks`, `mcpServers`, or `permissionMode` frontmatter fields — these are silently ignored. If a future persona needs any of those, the user must copy the file into `.claude/agents/` or `~/.claude/agents/` instead.
|
||||
|
||||
The fields that DO work in plugin agents are: `name`, `description`, `tools`, `disallowedTools`, `model`, `maxTurns`, `skills`, `memory`, `background`, `effort`, `isolation`, `color`, `initialPrompt`. Use `model` per-persona if you want to optimize cost (e.g. Haiku for `test-engineer` coverage scans, Sonnet for `code-reviewer`, Opus for `security-auditor`).
|
||||
|
||||
### Spawning multiple subagents in parallel
|
||||
|
||||
In Claude Code, parallel fan-out (Pattern 3) requires issuing **multiple Agent tool calls in a single assistant turn**. Sequential turns serialize execution. `/ship` calls this out explicitly. Any new orchestrator command should do the same.
|
||||
|
||||
---
|
||||
|
||||
## Worked example: Agent Teams for competing-hypothesis debugging
|
||||
|
||||
This example shows when to reach for **Agent Teams** instead of `/ship`'s subagent fan-out. The two patterns look similar from a distance — both spawn the same three personas — but the value comes from a different place.
|
||||
|
||||
### The scenario
|
||||
|
||||
> *Checkout occasionally hangs for ~30 seconds before completing. It happens roughly once every 50 sessions. No errors in logs. Started after last week's release.*
|
||||
|
||||
Plausible root causes (mutually exclusive, all fit the symptoms):
|
||||
|
||||
1. A race condition in the new payment-confirmation flow
|
||||
2. An auth check that occasionally falls through to a slow synchronous network call
|
||||
3. A missing index on a query that scales with cart size
|
||||
4. A flaky third-party API where the SDK retries silently before timing out
|
||||
|
||||
A single agent will pick the first plausible theory and stop investigating. A `/ship`-style subagent fan-out would have each persona report independently — but their reports never meet, so nothing rules out the wrong theories.
|
||||
|
||||
This is exactly the case the Agent Teams docs describe: *"With multiple independent investigators actively trying to disprove each other, the theory that survives is much more likely to be the actual root cause."*
|
||||
|
||||
### Why this is *not* a `/ship` job
|
||||
|
||||
| | `/ship` (subagents) | Agent Teams |
|
||||
|--|--------------------|-------------|
|
||||
| Sub-agents see | The same diff, different lenses | A shared task list, each other's messages |
|
||||
| Output | Three independent reports → one merge | Adversarial debate → consensus root cause |
|
||||
| Right when | You want a verdict on a known artifact | You want to *find* the artifact among hypotheses |
|
||||
|
||||
`/ship` is a verdict; Agent Teams is an investigation.
|
||||
|
||||
### Setup (one-time, per-environment)
|
||||
|
||||
Agent Teams is experimental. In `~/.claude/settings.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"env": {
|
||||
"CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Requires Claude Code v2.1.32 or later. The personas in this repo are picked up automatically — no team-config files to author by hand.
|
||||
|
||||
### The trigger prompt
|
||||
|
||||
Type into the lead session, in natural language:
|
||||
|
||||
```
|
||||
Users report checkout hangs for ~30 seconds intermittently after last
|
||||
week's release. No errors in logs.
|
||||
|
||||
Create an agent team to debug this with competing hypotheses. Spawn
|
||||
three teammates using the existing agent types:
|
||||
|
||||
- code-reviewer — investigate race conditions and blocking calls
|
||||
in the checkout code path
|
||||
- security-auditor — investigate auth checks, session handling,
|
||||
and any synchronous network calls added recently
|
||||
- test-engineer — propose tests that would distinguish between the
|
||||
hypotheses and check coverage gaps in checkout
|
||||
|
||||
Have them message each other directly to challenge each other's
|
||||
theories. Update findings as consensus emerges. Only converge when
|
||||
two teammates agree they can disprove the others'.
|
||||
```
|
||||
|
||||
The lead spawns three teammates referencing the existing persona names. The persona body is **appended** to each teammate's system prompt as additional instructions (on top of the team-coordination instructions the lead installs); the trigger prompt above becomes their task.
|
||||
|
||||
### What happens
|
||||
|
||||
1. Each teammate runs in its own context window, exploring the codebase from its own lens.
|
||||
2. Teammates use `message` to send findings to each other directly. The lead doesn't have to relay.
|
||||
3. The shared task list shows who's investigating what — visible at any time with `Ctrl+T` (in-process mode) or in a tmux pane (split mode).
|
||||
4. When `code-reviewer` finds a `Promise.all` that should be sequential, it messages `security-auditor` to confirm the auth call isn't part of the race. `security-auditor` checks and replies — either confirming the race is the real issue or producing counter-evidence.
|
||||
5. `test-engineer` proposes a focused integration test for whichever theory is winning, which the team uses to verify before declaring consensus.
|
||||
6. The lead synthesizes the converged finding and presents it to you.
|
||||
|
||||
You can interrupt at any teammate by cycling with `Shift+Down` and typing — useful for redirecting an investigator who's gone down a wrong path.
|
||||
|
||||
### When to clean up
|
||||
|
||||
When the investigation lands on a root cause, tell the lead:
|
||||
|
||||
```
|
||||
Clean up the team
|
||||
```
|
||||
|
||||
Always cleanup through the lead, not a teammate (per the docs: teammates lack full team context for cleanup).
|
||||
|
||||
### Cost expectation
|
||||
|
||||
Three Sonnet teammates running for ~10–15 minutes of investigation costs noticeably more than the same three personas spawned as subagents by `/ship`. The justification is *quality of conclusion* — for production debugging where the wrong fix is expensive, the extra tokens are a bargain. For a routine PR review, stick with `/ship`.
|
||||
|
||||
### Anti-pattern in this scenario
|
||||
|
||||
Do **not** rebuild this as a `/debug` slash command that fans out subagents. Subagents can't message each other — you'd lose the adversarial debate that makes the pattern work. If a workflow keeps coming up, document the trigger prompt above as a snippet rather than wrapping it in a slash command that misuses subagents.
|
||||
|
||||
### When *not* to use Agent Teams
|
||||
|
||||
- Production-bound verdict on a known diff → use `/ship` (subagents).
|
||||
- One specialist perspective on one artifact → direct persona invocation.
|
||||
- Sequential lifecycle (spec → plan → build) → user-driven slash commands (Pattern 4).
|
||||
- Read-heavy research with a small digest → built-in `Explore` subagent.
|
||||
|
||||
Reach for Agent Teams only when teammates **need** to challenge each other to produce the right answer.
|
||||
|
||||
---
|
||||
|
||||
## Anti-patterns
|
||||
|
||||
### A. Router persona ("meta-orchestrator")
|
||||
|
||||
A persona whose job is to decide which other persona to call.
|
||||
|
||||
```
|
||||
/work → router-persona → "this needs a review" → code-reviewer → router (paraphrases) → user
|
||||
```
|
||||
|
||||
**Why it fails:**
|
||||
- Pure routing layer with no domain value
|
||||
- Adds two paraphrasing hops → information loss + roughly 2× token cost
|
||||
- The user already knew they wanted a review; they could have called `/review` directly
|
||||
- Replicates the work that slash commands and intent mapping in `AGENTS.md` already do
|
||||
|
||||
**What to do instead:** add or refine slash commands. Document intent → command mapping in `AGENTS.md`.
|
||||
|
||||
---
|
||||
|
||||
### B. Persona that calls another persona
|
||||
|
||||
A `code-reviewer` that internally invokes `security-auditor` when it sees auth code.
|
||||
|
||||
**Why it fails:**
|
||||
- Personas were designed to produce a single perspective; chaining them defeats that
|
||||
- The summary the calling persona passes loses context the called persona needs
|
||||
- Failure modes multiply (which persona's output format wins? whose rules apply?)
|
||||
- Hides cost from the user
|
||||
|
||||
**What to do instead:** have the calling persona *recommend* a follow-up audit in its report. The user or a slash command runs the second pass.
|
||||
|
||||
---
|
||||
|
||||
### C. Sequential orchestrator that paraphrases
|
||||
|
||||
An agent that calls `/spec`, then `/plan`, then `/build`, etc. on the user's behalf.
|
||||
|
||||
**Why it fails:**
|
||||
- Loses the human checkpoints that catch wrong-direction work
|
||||
- Each hand-off summarizes context — accumulated drift over a long pipeline
|
||||
- Doubles token cost: orchestrator turn + sub-agent turn for every step
|
||||
- Removes user agency at exactly the points where judgment matters most
|
||||
|
||||
**What to do instead:** keep the user as the orchestrator. Document the recommended sequence in `README.md` and let users invoke it.
|
||||
|
||||
---
|
||||
|
||||
### D. Deep persona trees
|
||||
|
||||
`/ship` calls a `pre-ship-coordinator` that calls a `quality-coordinator` that calls `code-reviewer`.
|
||||
|
||||
**Why it fails:**
|
||||
- Each layer adds latency and tokens with no decision value
|
||||
- Debugging becomes a multi-level investigation
|
||||
- The leaf personas lose context to multiple summarization steps
|
||||
|
||||
**What to do instead:** keep the orchestration depth at most 1 (slash command → personas). The merge happens in the main agent.
|
||||
|
||||
---
|
||||
|
||||
## Decision flow
|
||||
|
||||
When considering a new orchestrated workflow, walk this flow:
|
||||
|
||||
```
|
||||
Is the work one perspective on one artifact?
|
||||
├── Yes → Direct invocation. Stop.
|
||||
└── No → Will the same composition repeat?
|
||||
├── No → Direct invocation, ad hoc. Stop.
|
||||
└── Yes → Are sub-tasks independent?
|
||||
├── No → Sequential slash commands run by user (Pattern 4).
|
||||
└── Yes → Parallel fan-out with merge (Pattern 3).
|
||||
Validate against the checklist above.
|
||||
If any check fails → fall back to single-persona command (Pattern 2).
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## When to add a new pattern to this catalog
|
||||
|
||||
Add a new entry only after:
|
||||
|
||||
1. You've used the pattern at least twice in real work
|
||||
2. You can name a concrete artifact in this repo that demonstrates it
|
||||
3. You can explain why an existing pattern wouldn't have worked
|
||||
4. You can describe its anti-pattern shadow (what people will mistakenly build instead)
|
||||
|
||||
Premature catalog entries become aspirational documentation that no one follows.
|
||||
@@ -0,0 +1,153 @@
|
||||
# Performance Checklist
|
||||
|
||||
Quick reference checklist for web application performance. Use alongside the `performance-optimization` skill.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Core Web Vitals Targets](#core-web-vitals-targets)
|
||||
- [TTFB Diagnosis](#ttfb-diagnosis)
|
||||
- [Frontend Checklist](#frontend-checklist)
|
||||
- [Backend Checklist](#backend-checklist)
|
||||
- [Measurement Commands](#measurement-commands)
|
||||
- [Common Anti-Patterns](#common-anti-patterns)
|
||||
|
||||
## Core Web Vitals Targets
|
||||
|
||||
| Metric | Good | Needs Work | Poor |
|
||||
|--------|------|------------|------|
|
||||
| LCP (Largest Contentful Paint) | ≤ 2.5s | ≤ 4.0s | > 4.0s |
|
||||
| INP (Interaction to Next Paint) | ≤ 200ms | ≤ 500ms | > 500ms |
|
||||
| CLS (Cumulative Layout Shift) | ≤ 0.1 | ≤ 0.25 | > 0.25 |
|
||||
|
||||
## TTFB Diagnosis
|
||||
|
||||
When TTFB is slow (> 800ms), check each component in DevTools Network waterfall:
|
||||
|
||||
- [ ] **DNS resolution** slow → add `<link rel="dns-prefetch">` or `<link rel="preconnect">` for known origins
|
||||
- [ ] **TCP/TLS handshake** slow → enable HTTP/2, consider edge deployment, verify keep-alive
|
||||
- [ ] **Server processing** slow → profile backend, check slow queries, add caching
|
||||
|
||||
## Frontend Checklist
|
||||
|
||||
### Images
|
||||
- [ ] Images use modern formats (WebP, AVIF)
|
||||
- [ ] Images are responsively sized (`srcset` and `sizes`)
|
||||
- [ ] Images and `<source>` elements have explicit `width` and `height` (prevents CLS in art direction)
|
||||
- [ ] Below-the-fold images use `loading="lazy"` and `decoding="async"`
|
||||
- [ ] Hero/LCP images use `fetchpriority="high"` and no lazy loading
|
||||
|
||||
### JavaScript
|
||||
- [ ] Bundle size under 200KB gzipped (initial load)
|
||||
- [ ] Code splitting with dynamic `import()` for routes and heavy features
|
||||
- [ ] Tree shaking enabled (verify dependency ships ESM and marks `sideEffects: false`)
|
||||
- [ ] No blocking JavaScript in `<head>` (use `defer` or `async`)
|
||||
- [ ] Heavy computation offloaded to Web Workers (if applicable)
|
||||
- [ ] `React.memo()` on expensive components that re-render with same props
|
||||
- [ ] `useMemo()` / `useCallback()` only where profiling shows benefit
|
||||
- [ ] Long tasks (> 50ms) broken up to keep the main thread available — main lever for INP
|
||||
- [ ] `yieldToMain` pattern used inside long-running loops so input events can run between chunks
|
||||
- [ ] Modern scheduling APIs used where available: `scheduler.yield()` (preferred), `scheduler.postTask()` with priorities, `isInputPending()` to yield only when needed
|
||||
- [ ] `requestIdleCallback` for deferrable, non-urgent work (analytics flush, prefetch, warmup)
|
||||
- [ ] Non-critical work deferred out of event handlers (e.g. analytics, logging) so the response to the interaction is not delayed
|
||||
- [ ] Third-party scripts loaded with `async` / `defer`, audited for size, and fronted by a facade when heavy (chat widgets, embeds)
|
||||
|
||||
### CSS
|
||||
- [ ] Critical CSS inlined or preloaded
|
||||
- [ ] No render-blocking CSS for non-critical styles
|
||||
- [ ] No CSS-in-JS runtime cost in production (use extraction)
|
||||
|
||||
### Fonts
|
||||
- [ ] Limited to 2–3 font families, 2–3 weights each (every additional weight is another request)
|
||||
- [ ] WOFF2 format only (smallest, universal support — skip WOFF/TTF/EOT)
|
||||
- [ ] Self-hosted when possible (third-party font CDNs add DNS + TCP + TLS round-trips)
|
||||
- [ ] LCP-critical fonts preloaded: `<link rel="preload" as="font" type="font/woff2" crossorigin>`
|
||||
- [ ] `font-display: swap` (or `optional` for non-critical) to avoid FOIT blocking render
|
||||
- [ ] Subsetted via `unicode-range` to ship only the glyphs each page needs
|
||||
- [ ] Variable fonts considered when multiple weights/styles are required (one file replaces many)
|
||||
- [ ] Fallback font metrics adjusted with `size-adjust`, `ascent-override`, `descent-override` to reduce CLS on font swap
|
||||
- [ ] System font stack considered before any custom font
|
||||
|
||||
### Network
|
||||
- [ ] Static assets cached with long `max-age` + content hashing
|
||||
- [ ] API responses cached where appropriate (`Cache-Control`)
|
||||
- [ ] HTTP/2 or HTTP/3 enabled
|
||||
- [ ] Resources preconnected (`<link rel="preconnect">`) for known origins
|
||||
- [ ] `fetchpriority` used on critical non-image resources (e.g., key `<link rel="preload">`, above-the-fold `<script>`) — not only on `<img>`
|
||||
- [ ] No unnecessary redirects
|
||||
|
||||
### Rendering
|
||||
- [ ] No layout thrashing (forced synchronous layouts)
|
||||
- [ ] Animations use `transform` and `opacity` (GPU-accelerated)
|
||||
- [ ] Long lists use virtualization (e.g., `react-window`)
|
||||
- [ ] No unnecessary full-page re-renders
|
||||
- [ ] Off-screen sections use `content-visibility: auto` with `contain-intrinsic-size` to skip layout/paint of non-visible areas
|
||||
- [ ] No `unload` event handlers and no `Cache-Control: no-store` on HTML responses — preserves back/forward cache (bfcache) eligibility
|
||||
|
||||
## Backend Checklist
|
||||
|
||||
### Database
|
||||
- [ ] No N+1 query patterns (use eager loading / joins)
|
||||
- [ ] Queries have appropriate indexes
|
||||
- [ ] List endpoints paginated (never `SELECT * FROM table`)
|
||||
- [ ] Connection pooling configured
|
||||
- [ ] Slow query logging enabled
|
||||
|
||||
### API
|
||||
- [ ] Response times < 200ms (p95)
|
||||
- [ ] No synchronous heavy computation in request handlers
|
||||
- [ ] Bulk operations instead of loops of individual calls
|
||||
- [ ] Response compression (gzip/brotli)
|
||||
- [ ] Appropriate caching (in-memory, Redis, CDN)
|
||||
|
||||
### Infrastructure
|
||||
- [ ] CDN for static assets
|
||||
- [ ] Server located close to users (or edge deployment)
|
||||
- [ ] Horizontal scaling configured (if needed)
|
||||
- [ ] Health check endpoint for load balancer
|
||||
|
||||
## Measurement Commands
|
||||
|
||||
### INP field data and DevTools workflow
|
||||
|
||||
1. **Field data first** — check [CrUX Vis](https://developer.chrome.com/docs/crux/vis) or your RUM tool for real-user INP before optimising
|
||||
2. **Identify slow interactions** — open DevTools → Performance panel → record while interacting; look for long tasks triggered by clicks/keystrokes
|
||||
3. **Test on mid-range Android** — INP issues often only surface on slower hardware; use a real device or DevTools CPU throttling (4×–6× slowdown)
|
||||
|
||||
```bash
|
||||
# Lighthouse CLI
|
||||
npx lighthouse https://localhost:3000 --output json --output-path ./report.json
|
||||
|
||||
# Bundle analysis
|
||||
npx webpack-bundle-analyzer stats.json
|
||||
# or for Vite:
|
||||
npx vite-bundle-visualizer
|
||||
|
||||
# Check bundle size
|
||||
npx bundlesize
|
||||
|
||||
# Web Vitals in code
|
||||
import { onLCP, onINP, onCLS } from 'web-vitals';
|
||||
onLCP(console.log);
|
||||
onINP(console.log);
|
||||
onCLS(console.log);
|
||||
|
||||
# INP with interaction-level detail (attribution build)
|
||||
import { onINP } from 'web-vitals/attribution';
|
||||
onINP(({ value, attribution }) => {
|
||||
const { interactionTarget, inputDelay, processingDuration, presentationDelay } = attribution;
|
||||
console.log({ value, interactionTarget, inputDelay, processingDuration, presentationDelay });
|
||||
});
|
||||
```
|
||||
|
||||
## Common Anti-Patterns
|
||||
|
||||
| Anti-Pattern | Impact | Fix |
|
||||
|---|---|---|
|
||||
| N+1 queries | Linear DB load growth | Use joins, includes, or batch loading |
|
||||
| Unbounded queries | Memory exhaustion, timeouts | Always paginate, add LIMIT |
|
||||
| Missing indexes | Slow reads as data grows | Add indexes for filtered/sorted columns |
|
||||
| Layout thrashing | Jank, dropped frames | Batch DOM reads, then batch writes |
|
||||
| Unoptimized images | Slow LCP, wasted bandwidth | Use WebP, responsive sizes, lazy load |
|
||||
| Large bundles | Slow Time to Interactive | Code split, tree shake, audit deps |
|
||||
| Blocking main thread | Poor INP, unresponsive UI | Chunk long tasks with `scheduler.yield()` / `yieldToMain`, offload to Web Workers |
|
||||
| Memory leaks | Growing memory, eventual crash | Clean up listeners, intervals, refs |
|
||||
@@ -0,0 +1,134 @@
|
||||
# Security Checklist
|
||||
|
||||
Quick reference for web application security. Use alongside the `security-and-hardening` skill.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Pre-Commit Checks](#pre-commit-checks)
|
||||
- [Authentication](#authentication)
|
||||
- [Authorization](#authorization)
|
||||
- [Input Validation](#input-validation)
|
||||
- [Security Headers](#security-headers)
|
||||
- [CORS Configuration](#cors-configuration)
|
||||
- [Data Protection](#data-protection)
|
||||
- [Dependency Security](#dependency-security)
|
||||
- [Error Handling](#error-handling)
|
||||
- [OWASP Top 10 Quick Reference](#owasp-top-10-quick-reference)
|
||||
|
||||
## Pre-Commit Checks
|
||||
|
||||
- [ ] No secrets in code (`git diff --cached | grep -i "password\|secret\|api_key\|token"`)
|
||||
- [ ] `.gitignore` covers: `.env`, `.env.local`, `*.pem`, `*.key`
|
||||
- [ ] `.env.example` uses placeholder values (not real secrets)
|
||||
|
||||
## Authentication
|
||||
|
||||
- [ ] Passwords hashed with bcrypt (≥12 rounds), scrypt, or argon2
|
||||
- [ ] Session cookies: `httpOnly`, `secure`, `sameSite: 'lax'`
|
||||
- [ ] Session expiration configured (reasonable max-age)
|
||||
- [ ] Rate limiting on login endpoint (≤10 attempts per 15 minutes)
|
||||
- [ ] Password reset tokens: time-limited (≤1 hour), single-use
|
||||
- [ ] Account lockout after repeated failures (optional, with notification)
|
||||
- [ ] MFA supported for sensitive operations (optional but recommended)
|
||||
|
||||
## Authorization
|
||||
|
||||
- [ ] Every protected endpoint checks authentication
|
||||
- [ ] Every resource access checks ownership/role (prevents IDOR)
|
||||
- [ ] Admin endpoints require admin role verification
|
||||
- [ ] API keys scoped to minimum necessary permissions
|
||||
- [ ] JWT tokens validated (signature, expiration, issuer)
|
||||
|
||||
## Input Validation
|
||||
|
||||
- [ ] All user input validated at system boundaries (API routes, form handlers)
|
||||
- [ ] Validation uses allowlists (not denylists)
|
||||
- [ ] String lengths constrained (min/max)
|
||||
- [ ] Numeric ranges validated
|
||||
- [ ] Email, URL, and date formats validated with proper libraries
|
||||
- [ ] File uploads: type restricted, size limited, content verified
|
||||
- [ ] SQL queries parameterized (no string concatenation)
|
||||
- [ ] HTML output encoded (use framework auto-escaping)
|
||||
- [ ] URLs validated before redirect (prevent open redirect)
|
||||
|
||||
## Security Headers
|
||||
|
||||
```
|
||||
Content-Security-Policy: default-src 'self'; script-src 'self'
|
||||
Strict-Transport-Security: max-age=31536000; includeSubDomains
|
||||
X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: DENY
|
||||
X-XSS-Protection: 0 (disabled, rely on CSP)
|
||||
Referrer-Policy: strict-origin-when-cross-origin
|
||||
Permissions-Policy: camera=(), microphone=(), geolocation=()
|
||||
```
|
||||
|
||||
## CORS Configuration
|
||||
|
||||
```typescript
|
||||
// Restrictive (recommended)
|
||||
cors({
|
||||
origin: ['https://yourdomain.com', 'https://app.yourdomain.com'],
|
||||
credentials: true,
|
||||
methods: ['GET', 'POST', 'PUT', 'PATCH', 'DELETE'],
|
||||
allowedHeaders: ['Content-Type', 'Authorization'],
|
||||
})
|
||||
|
||||
// NEVER use in production:
|
||||
cors({ origin: '*' }) // Allows any origin
|
||||
```
|
||||
|
||||
## Data Protection
|
||||
|
||||
- [ ] Sensitive fields excluded from API responses (`passwordHash`, `resetToken`, etc.)
|
||||
- [ ] Sensitive data not logged (passwords, tokens, full CC numbers)
|
||||
- [ ] PII encrypted at rest (if required by regulation)
|
||||
- [ ] HTTPS for all external communication
|
||||
- [ ] Database backups encrypted
|
||||
|
||||
## Dependency Security
|
||||
|
||||
```bash
|
||||
# Audit dependencies
|
||||
npm audit
|
||||
|
||||
# Fix automatically where possible
|
||||
npm audit fix
|
||||
|
||||
# Check for critical vulnerabilities
|
||||
npm audit --audit-level=critical
|
||||
|
||||
# Keep dependencies updated
|
||||
npx npm-check-updates
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
```typescript
|
||||
// Production: generic error, no internals
|
||||
res.status(500).json({
|
||||
error: { code: 'INTERNAL_ERROR', message: 'Something went wrong' }
|
||||
});
|
||||
|
||||
// NEVER in production:
|
||||
res.status(500).json({
|
||||
error: err.message,
|
||||
stack: err.stack, // Exposes internals
|
||||
query: err.sql, // Exposes database details
|
||||
});
|
||||
```
|
||||
|
||||
## OWASP Top 10 Quick Reference
|
||||
|
||||
| # | Vulnerability | Prevention |
|
||||
|---|---|---|
|
||||
| 1 | Broken Access Control | Auth checks on every endpoint, ownership verification |
|
||||
| 2 | Cryptographic Failures | HTTPS, strong hashing, no secrets in code |
|
||||
| 3 | Injection | Parameterized queries, input validation |
|
||||
| 4 | Insecure Design | Threat modeling, spec-driven development |
|
||||
| 5 | Security Misconfiguration | Security headers, minimal permissions, audit deps |
|
||||
| 6 | Vulnerable Components | `npm audit`, keep deps updated, minimal deps |
|
||||
| 7 | Auth Failures | Strong passwords, rate limiting, session management |
|
||||
| 8 | Data Integrity Failures | Verify updates/dependencies, signed artifacts |
|
||||
| 9 | Logging Failures | Log security events, don't log secrets |
|
||||
| 10 | SSRF | Validate/allowlist URLs, restrict outbound requests |
|
||||
@@ -0,0 +1,236 @@
|
||||
# Testing Patterns Reference
|
||||
|
||||
Quick reference for common testing patterns across the stack. Use alongside the `test-driven-development` skill.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Test Structure (Arrange-Act-Assert)](#test-structure-arrange-act-assert)
|
||||
- [Test Naming Conventions](#test-naming-conventions)
|
||||
- [Common Assertions](#common-assertions)
|
||||
- [Mocking Patterns](#mocking-patterns)
|
||||
- [React/Component Testing](#reactcomponent-testing)
|
||||
- [API / Integration Testing](#api--integration-testing)
|
||||
- [E2E Testing (Playwright)](#e2e-testing-playwright)
|
||||
- [Test Anti-Patterns](#test-anti-patterns)
|
||||
|
||||
## Test Structure (Arrange-Act-Assert)
|
||||
|
||||
```typescript
|
||||
it('describes expected behavior', () => {
|
||||
// Arrange: Set up test data and preconditions
|
||||
const input = { title: 'Test Task', priority: 'high' };
|
||||
|
||||
// Act: Perform the action being tested
|
||||
const result = createTask(input);
|
||||
|
||||
// Assert: Verify the outcome
|
||||
expect(result.title).toBe('Test Task');
|
||||
expect(result.priority).toBe('high');
|
||||
expect(result.status).toBe('pending');
|
||||
});
|
||||
```
|
||||
|
||||
## Test Naming Conventions
|
||||
|
||||
```typescript
|
||||
// Pattern: [unit] [expected behavior] [condition]
|
||||
describe('TaskService.createTask', () => {
|
||||
it('creates a task with default pending status', () => {});
|
||||
it('throws ValidationError when title is empty', () => {});
|
||||
it('trims whitespace from title', () => {});
|
||||
it('generates a unique ID for each task', () => {});
|
||||
});
|
||||
```
|
||||
|
||||
## Common Assertions
|
||||
|
||||
```typescript
|
||||
// Equality
|
||||
expect(result).toBe(expected); // Strict equality (===)
|
||||
expect(result).toEqual(expected); // Deep equality (objects/arrays)
|
||||
expect(result).toStrictEqual(expected); // Deep equality + type matching
|
||||
|
||||
// Truthiness
|
||||
expect(result).toBeTruthy();
|
||||
expect(result).toBeFalsy();
|
||||
expect(result).toBeNull();
|
||||
expect(result).toBeDefined();
|
||||
expect(result).toBeUndefined();
|
||||
|
||||
// Numbers
|
||||
expect(result).toBeGreaterThan(5);
|
||||
expect(result).toBeLessThanOrEqual(10);
|
||||
expect(result).toBeCloseTo(0.3, 5); // Floating point
|
||||
|
||||
// Strings
|
||||
expect(result).toMatch(/pattern/);
|
||||
expect(result).toContain('substring');
|
||||
|
||||
// Arrays / Objects
|
||||
expect(array).toContain(item);
|
||||
expect(array).toHaveLength(3);
|
||||
expect(object).toHaveProperty('key', 'value');
|
||||
|
||||
// Errors
|
||||
expect(() => fn()).toThrow();
|
||||
expect(() => fn()).toThrow(ValidationError);
|
||||
expect(() => fn()).toThrow('specific message');
|
||||
|
||||
// Async
|
||||
await expect(asyncFn()).resolves.toBe(value);
|
||||
await expect(asyncFn()).rejects.toThrow(Error);
|
||||
```
|
||||
|
||||
## Mocking Patterns
|
||||
|
||||
### Mock Functions
|
||||
|
||||
```typescript
|
||||
const mockFn = jest.fn();
|
||||
mockFn.mockReturnValue(42);
|
||||
mockFn.mockResolvedValue({ data: 'test' });
|
||||
mockFn.mockImplementation((x) => x * 2);
|
||||
|
||||
expect(mockFn).toHaveBeenCalled();
|
||||
expect(mockFn).toHaveBeenCalledWith('arg1', 'arg2');
|
||||
expect(mockFn).toHaveBeenCalledTimes(3);
|
||||
```
|
||||
|
||||
### Mock Modules
|
||||
|
||||
```typescript
|
||||
// Mock an entire module
|
||||
jest.mock('./database', () => ({
|
||||
query: jest.fn().mockResolvedValue([{ id: 1, title: 'Test' }]),
|
||||
}));
|
||||
|
||||
// Mock specific exports
|
||||
jest.mock('./utils', () => ({
|
||||
...jest.requireActual('./utils'),
|
||||
generateId: jest.fn().mockReturnValue('test-id'),
|
||||
}));
|
||||
```
|
||||
|
||||
### Mock at Boundaries Only
|
||||
|
||||
```
|
||||
Mock these: Don't mock these:
|
||||
├── Database calls ├── Internal utility functions
|
||||
├── HTTP requests ├── Business logic
|
||||
├── File system operations ├── Data transformations
|
||||
├── External API calls ├── Validation functions
|
||||
└── Time/Date (when needed) └── Pure functions
|
||||
```
|
||||
|
||||
## React/Component Testing
|
||||
|
||||
```tsx
|
||||
import { render, screen, fireEvent, waitFor } from '@testing-library/react';
|
||||
|
||||
describe('TaskForm', () => {
|
||||
it('submits the form with entered data', async () => {
|
||||
const onSubmit = jest.fn();
|
||||
render(<TaskForm onSubmit={onSubmit} />);
|
||||
|
||||
// Find elements by accessible role/label (not test IDs)
|
||||
await screen.findByRole('textbox', { name: /title/i });
|
||||
fireEvent.change(screen.getByRole('textbox', { name: /title/i }), {
|
||||
target: { value: 'New Task' },
|
||||
});
|
||||
fireEvent.click(screen.getByRole('button', { name: /create/i }));
|
||||
|
||||
await waitFor(() => {
|
||||
expect(onSubmit).toHaveBeenCalledWith({ title: 'New Task' });
|
||||
});
|
||||
});
|
||||
|
||||
it('shows validation error for empty title', async () => {
|
||||
render(<TaskForm onSubmit={jest.fn()} />);
|
||||
|
||||
fireEvent.click(screen.getByRole('button', { name: /create/i }));
|
||||
|
||||
expect(await screen.findByText(/title is required/i)).toBeInTheDocument();
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
## API / Integration Testing
|
||||
|
||||
```typescript
|
||||
import request from 'supertest';
|
||||
import { app } from '../src/app';
|
||||
|
||||
describe('POST /api/tasks', () => {
|
||||
it('creates a task and returns 201', async () => {
|
||||
const response = await request(app)
|
||||
.post('/api/tasks')
|
||||
.send({ title: 'Test Task' })
|
||||
.set('Authorization', `Bearer ${testToken}`)
|
||||
.expect(201);
|
||||
|
||||
expect(response.body).toMatchObject({
|
||||
id: expect.any(String),
|
||||
title: 'Test Task',
|
||||
status: 'pending',
|
||||
});
|
||||
});
|
||||
|
||||
it('returns 422 for invalid input', async () => {
|
||||
const response = await request(app)
|
||||
.post('/api/tasks')
|
||||
.send({ title: '' })
|
||||
.set('Authorization', `Bearer ${testToken}`)
|
||||
.expect(422);
|
||||
|
||||
expect(response.body.error.code).toBe('VALIDATION_ERROR');
|
||||
});
|
||||
|
||||
it('returns 401 without authentication', async () => {
|
||||
await request(app)
|
||||
.post('/api/tasks')
|
||||
.send({ title: 'Test' })
|
||||
.expect(401);
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
## E2E Testing (Playwright)
|
||||
|
||||
```typescript
|
||||
import { test, expect } from '@playwright/test';
|
||||
|
||||
test('user can create and complete a task', async ({ page }) => {
|
||||
// Navigate and authenticate
|
||||
await page.goto('/');
|
||||
await page.fill('[name="email"]', 'test@example.com');
|
||||
await page.fill('[name="password"]', 'testpass123');
|
||||
await page.click('button:has-text("Log in")');
|
||||
|
||||
// Create a task
|
||||
await page.click('button:has-text("New Task")');
|
||||
await page.fill('[name="title"]', 'Buy groceries');
|
||||
await page.click('button:has-text("Create")');
|
||||
|
||||
// Verify task appears
|
||||
await expect(page.locator('text=Buy groceries')).toBeVisible();
|
||||
|
||||
// Complete the task
|
||||
await page.click('[aria-label="Complete Buy groceries"]');
|
||||
await expect(page.locator('text=Buy groceries')).toHaveCSS(
|
||||
'text-decoration-line', 'line-through'
|
||||
);
|
||||
});
|
||||
```
|
||||
|
||||
## Test Anti-Patterns
|
||||
|
||||
| Anti-Pattern | Problem | Better Approach |
|
||||
|---|---|---|
|
||||
| Testing implementation details | Breaks on refactor | Test inputs/outputs |
|
||||
| Snapshot everything | No one reviews snapshot diffs | Assert specific values |
|
||||
| Shared mutable state | Tests pollute each other | Setup/teardown per test |
|
||||
| Testing third-party code | Wastes time, not your bug | Mock the boundary |
|
||||
| Skipping tests to pass CI | Hides real bugs | Fix or delete the test |
|
||||
| Using `test.skip` permanently | Dead code | Remove or fix it |
|
||||
| Overly broad assertions | Doesn't catch regressions | Be specific |
|
||||
| No async error handling | Swallowed errors, false passes | Always `await` async tests |
|
||||
Reference in New Issue
Block a user