Initial commit
This commit is contained in:
309
.claude/skills/shipping-and-launch/SKILL.md
Normal file
309
.claude/skills/shipping-and-launch/SKILL.md
Normal file
@@ -0,0 +1,309 @@
|
||||
---
|
||||
name: shipping-and-launch
|
||||
description: Prepares production launches. Use when preparing to deploy to production. Use when you need a pre-launch checklist, when setting up monitoring, when planning a staged rollout, or when you need a rollback strategy.
|
||||
---
|
||||
|
||||
# Shipping and Launch
|
||||
|
||||
## Overview
|
||||
|
||||
Ship with confidence. The goal is not just to deploy — it's to deploy safely, with monitoring in place, a rollback plan ready, and a clear understanding of what success looks like. Every launch should be reversible, observable, and incremental.
|
||||
|
||||
## When to Use
|
||||
|
||||
- Deploying a feature to production for the first time
|
||||
- Releasing a significant change to users
|
||||
- Migrating data or infrastructure
|
||||
- Opening a beta or early access program
|
||||
- Any deployment that carries risk (all of them)
|
||||
|
||||
## The Pre-Launch Checklist
|
||||
|
||||
### Code Quality
|
||||
|
||||
- [ ] All tests pass (unit, integration, e2e)
|
||||
- [ ] Build succeeds with no warnings
|
||||
- [ ] Lint and type checking pass
|
||||
- [ ] Code reviewed and approved
|
||||
- [ ] No TODO comments that should be resolved before launch
|
||||
- [ ] No `console.log` debugging statements in production code
|
||||
- [ ] Error handling covers expected failure modes
|
||||
|
||||
### Security
|
||||
|
||||
- [ ] No secrets in code or version control
|
||||
- [ ] `npm audit` shows no critical or high vulnerabilities
|
||||
- [ ] Input validation on all user-facing endpoints
|
||||
- [ ] Authentication and authorization checks in place
|
||||
- [ ] Security headers configured (CSP, HSTS, etc.)
|
||||
- [ ] Rate limiting on authentication endpoints
|
||||
- [ ] CORS configured to specific origins (not wildcard)
|
||||
|
||||
### Performance
|
||||
|
||||
- [ ] Core Web Vitals within "Good" thresholds
|
||||
- [ ] No N+1 queries in critical paths
|
||||
- [ ] Images optimized (compression, responsive sizes, lazy loading)
|
||||
- [ ] Bundle size within budget
|
||||
- [ ] Database queries have appropriate indexes
|
||||
- [ ] Caching configured for static assets and repeated queries
|
||||
|
||||
### Accessibility
|
||||
|
||||
- [ ] Keyboard navigation works for all interactive elements
|
||||
- [ ] Screen reader can convey page content and structure
|
||||
- [ ] Color contrast meets WCAG 2.1 AA (4.5:1 for text)
|
||||
- [ ] Focus management correct for modals and dynamic content
|
||||
- [ ] Error messages are descriptive and associated with form fields
|
||||
- [ ] No accessibility warnings in axe-core or Lighthouse
|
||||
|
||||
### Infrastructure
|
||||
|
||||
- [ ] Environment variables set in production
|
||||
- [ ] Database migrations applied (or ready to apply)
|
||||
- [ ] DNS and SSL configured
|
||||
- [ ] CDN configured for static assets
|
||||
- [ ] Logging and error reporting configured
|
||||
- [ ] Health check endpoint exists and responds
|
||||
|
||||
### Documentation
|
||||
|
||||
- [ ] README updated with any new setup requirements
|
||||
- [ ] API documentation current
|
||||
- [ ] ADRs written for any architectural decisions
|
||||
- [ ] Changelog updated
|
||||
- [ ] User-facing documentation updated (if applicable)
|
||||
|
||||
## Feature Flag Strategy
|
||||
|
||||
Ship behind feature flags to decouple deployment from release:
|
||||
|
||||
```typescript
|
||||
// Feature flag check
|
||||
const flags = await getFeatureFlags(userId);
|
||||
|
||||
if (flags.taskSharing) {
|
||||
// New feature: task sharing
|
||||
return <TaskSharingPanel task={task} />;
|
||||
}
|
||||
|
||||
// Default: existing behavior
|
||||
return null;
|
||||
```
|
||||
|
||||
**Feature flag lifecycle:**
|
||||
|
||||
```
|
||||
1. DEPLOY with flag OFF → Code is in production but inactive
|
||||
2. ENABLE for team/beta → Internal testing in production environment
|
||||
3. GRADUAL ROLLOUT → 5% → 25% → 50% → 100% of users
|
||||
4. MONITOR at each stage → Watch error rates, performance, user feedback
|
||||
5. CLEAN UP → Remove flag and dead code path after full rollout
|
||||
```
|
||||
|
||||
**Rules:**
|
||||
- Every feature flag has an owner and an expiration date
|
||||
- Clean up flags within 2 weeks of full rollout
|
||||
- Don't nest feature flags (creates exponential combinations)
|
||||
- Test both flag states (on and off) in CI
|
||||
|
||||
## Staged Rollout
|
||||
|
||||
### The Rollout Sequence
|
||||
|
||||
```
|
||||
1. DEPLOY to staging
|
||||
└── Full test suite in staging environment
|
||||
└── Manual smoke test of critical flows
|
||||
|
||||
2. DEPLOY to production (feature flag OFF)
|
||||
└── Verify deployment succeeded (health check)
|
||||
└── Check error monitoring (no new errors)
|
||||
|
||||
3. ENABLE for team (flag ON for internal users)
|
||||
└── Team uses the feature in production
|
||||
└── 24-hour monitoring window
|
||||
|
||||
4. CANARY rollout (flag ON for 5% of users)
|
||||
└── Monitor error rates, latency, user behavior
|
||||
└── Compare metrics: canary vs. baseline
|
||||
└── 24-48 hour monitoring window
|
||||
└── Advance only if all thresholds pass (see table below)
|
||||
|
||||
5. GRADUAL increase (25% -> 50% -> 100%)
|
||||
└── Same monitoring at each step
|
||||
└── Ability to roll back to previous percentage at any point
|
||||
|
||||
6. FULL rollout (flag ON for all users)
|
||||
└── Monitor for 1 week
|
||||
└── Clean up feature flag
|
||||
```
|
||||
|
||||
### Rollout Decision Thresholds
|
||||
|
||||
Use these thresholds to decide whether to advance, hold, or roll back at each stage:
|
||||
|
||||
| Metric | Advance (green) | Hold and investigate (yellow) | Roll back (red) |
|
||||
|--------|-----------------|-------------------------------|-----------------|
|
||||
| Error rate | Within 10% of baseline | 10-100% above baseline | >2x baseline |
|
||||
| P95 latency | Within 20% of baseline | 20-50% above baseline | >50% above baseline |
|
||||
| Client JS errors | No new error types | New errors at <0.1% of sessions | New errors at >0.1% of sessions |
|
||||
| Business metrics | Neutral or positive | Decline <5% (may be noise) | Decline >5% |
|
||||
|
||||
### When to Roll Back
|
||||
|
||||
Roll back immediately if:
|
||||
- Error rate increases by more than 2x baseline
|
||||
- P95 latency increases by more than 50%
|
||||
- User-reported issues spike
|
||||
- Data integrity issues detected
|
||||
- Security vulnerability discovered
|
||||
|
||||
## Monitoring and Observability
|
||||
|
||||
### What to Monitor
|
||||
|
||||
```
|
||||
Application metrics:
|
||||
├── Error rate (total and by endpoint)
|
||||
├── Response time (p50, p95, p99)
|
||||
├── Request volume
|
||||
├── Active users
|
||||
└── Key business metrics (conversion, engagement)
|
||||
|
||||
Infrastructure metrics:
|
||||
├── CPU and memory utilization
|
||||
├── Database connection pool usage
|
||||
├── Disk space
|
||||
├── Network latency
|
||||
└── Queue depth (if applicable)
|
||||
|
||||
Client metrics:
|
||||
├── Core Web Vitals (LCP, INP, CLS)
|
||||
├── JavaScript errors
|
||||
├── API error rates from client perspective
|
||||
└── Page load time
|
||||
```
|
||||
|
||||
### Error Reporting
|
||||
|
||||
```typescript
|
||||
// Set up error boundary with reporting
|
||||
class ErrorBoundary extends React.Component {
|
||||
componentDidCatch(error: Error, info: React.ErrorInfo) {
|
||||
// Report to error tracking service
|
||||
reportError(error, {
|
||||
componentStack: info.componentStack,
|
||||
userId: getCurrentUser()?.id,
|
||||
page: window.location.pathname,
|
||||
});
|
||||
}
|
||||
|
||||
render() {
|
||||
if (this.state.hasError) {
|
||||
return <ErrorFallback onRetry={() => this.setState({ hasError: false })} />;
|
||||
}
|
||||
return this.props.children;
|
||||
}
|
||||
}
|
||||
|
||||
// Server-side error reporting
|
||||
app.use((err: Error, req: Request, res: Response, next: NextFunction) => {
|
||||
reportError(err, {
|
||||
method: req.method,
|
||||
url: req.url,
|
||||
userId: req.user?.id,
|
||||
});
|
||||
|
||||
// Don't expose internals to users
|
||||
res.status(500).json({
|
||||
error: { code: 'INTERNAL_ERROR', message: 'Something went wrong' },
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
### Post-Launch Verification
|
||||
|
||||
In the first hour after launch:
|
||||
|
||||
```
|
||||
1. Check health endpoint returns 200
|
||||
2. Check error monitoring dashboard (no new error types)
|
||||
3. Check latency dashboard (no regression)
|
||||
4. Test the critical user flow manually
|
||||
5. Verify logs are flowing and readable
|
||||
6. Confirm rollback mechanism works (dry run if possible)
|
||||
```
|
||||
|
||||
## Rollback Strategy
|
||||
|
||||
Every deployment needs a rollback plan before it happens:
|
||||
|
||||
```markdown
|
||||
## Rollback Plan for [Feature/Release]
|
||||
|
||||
### Trigger Conditions
|
||||
- Error rate > 2x baseline
|
||||
- P95 latency > [X]ms
|
||||
- User reports of [specific issue]
|
||||
|
||||
### Rollback Steps
|
||||
1. Disable feature flag (if applicable)
|
||||
OR
|
||||
1. Deploy previous version: `git revert <commit> && git push`
|
||||
2. Verify rollback: health check, error monitoring
|
||||
3. Communicate: notify team of rollback
|
||||
|
||||
### Database Considerations
|
||||
- Migration [X] has a rollback: `npx prisma migrate rollback`
|
||||
- Data inserted by new feature: [preserved / cleaned up]
|
||||
|
||||
### Time to Rollback
|
||||
- Feature flag: < 1 minute
|
||||
- Redeploy previous version: < 5 minutes
|
||||
- Database rollback: < 15 minutes
|
||||
```
|
||||
## See Also
|
||||
|
||||
- For security pre-launch checks, see `references/security-checklist.md`
|
||||
- For performance pre-launch checklist, see `references/performance-checklist.md`
|
||||
- For accessibility verification before launch, see `references/accessibility-checklist.md`
|
||||
|
||||
## Common Rationalizations
|
||||
|
||||
| Rationalization | Reality |
|
||||
|---|---|
|
||||
| "It works in staging, it'll work in production" | Production has different data, traffic patterns, and edge cases. Monitor after deploy. |
|
||||
| "We don't need feature flags for this" | Every feature benefits from a kill switch. Even "simple" changes can break things. |
|
||||
| "Monitoring is overhead" | Not having monitoring means you discover problems from user complaints instead of dashboards. |
|
||||
| "We'll add monitoring later" | Add it before launch. You can't debug what you can't see. |
|
||||
| "Rolling back is admitting failure" | Rolling back is responsible engineering. Shipping a broken feature is the failure. |
|
||||
|
||||
## Red Flags
|
||||
|
||||
- Deploying without a rollback plan
|
||||
- No monitoring or error reporting in production
|
||||
- Big-bang releases (everything at once, no staging)
|
||||
- Feature flags with no expiration or owner
|
||||
- No one monitoring the deploy for the first hour
|
||||
- Production environment configuration done by memory, not code
|
||||
- "It's Friday afternoon, let's ship it"
|
||||
|
||||
## Verification
|
||||
|
||||
Before deploying:
|
||||
|
||||
- [ ] Pre-launch checklist completed (all sections green)
|
||||
- [ ] Feature flag configured (if applicable)
|
||||
- [ ] Rollback plan documented
|
||||
- [ ] Monitoring dashboards set up
|
||||
- [ ] Team notified of deployment
|
||||
|
||||
After deploying:
|
||||
|
||||
- [ ] Health check returns 200
|
||||
- [ ] Error rate is normal
|
||||
- [ ] Latency is normal
|
||||
- [ ] Critical user flow works
|
||||
- [ ] Logs are flowing
|
||||
- [ ] Rollback tested or verified ready
|
||||
@@ -0,0 +1,160 @@
|
||||
# Accessibility Checklist
|
||||
|
||||
Quick reference for WCAG 2.1 AA compliance. Use alongside the `frontend-ui-engineering` skill.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Essential Checks](#essential-checks)
|
||||
- [Common HTML Patterns](#common-html-patterns)
|
||||
- [Testing Tools](#testing-tools)
|
||||
- [Quick Reference: ARIA Live Regions](#quick-reference-aria-live-regions)
|
||||
- [Common Anti-Patterns](#common-anti-patterns)
|
||||
|
||||
## Essential Checks
|
||||
|
||||
### Keyboard Navigation
|
||||
- [ ] All interactive elements focusable via Tab key
|
||||
- [ ] Focus order follows visual/logical order
|
||||
- [ ] Focus is visible (outline/ring on focused elements)
|
||||
- [ ] Custom widgets have keyboard support (Enter to activate, Escape to close)
|
||||
- [ ] No keyboard traps (user can always Tab away from a component)
|
||||
- [ ] Skip-to-content link at top of page - visible (at least) on keyboard focus
|
||||
- [ ] Modals trap focus while open, return focus on close
|
||||
|
||||
### Screen Readers
|
||||
- [ ] All images have `alt` text (or `alt=""` for decorative images)
|
||||
- [ ] All form inputs have associated labels (`<label>` or `aria-label`)
|
||||
- [ ] Buttons and links have descriptive text (not "Click here")
|
||||
- [ ] Icon-only buttons have `aria-label`
|
||||
- [ ] Page has one `<h1>` and headings don't skip levels
|
||||
- [ ] Dynamic content changes announced (`aria-live` regions)
|
||||
- [ ] Tables have `<th>` headers with scope
|
||||
|
||||
### Visual
|
||||
- [ ] Text contrast ≥ 4.5:1 (normal text) or ≥ 3:1 (large text, 18px+)
|
||||
- [ ] UI components contrast ≥ 3:1 against background
|
||||
- [ ] Color is not the only way to convey information
|
||||
- [ ] Text resizable to 200% without breaking layout
|
||||
- [ ] No content that flashes more than 3 times per second
|
||||
|
||||
### Forms
|
||||
- [ ] Every input has a visible label
|
||||
- [ ] Required fields indicated (not by color alone)
|
||||
- [ ] Error messages specific and associated with the field
|
||||
- [ ] Error state visible by more than color (icon, text, border)
|
||||
- [ ] Form submission errors summarized and focusable
|
||||
- [ ] Known fields use autocomplete (for example `type="email" autocomplete="email"`)
|
||||
|
||||
### Content
|
||||
- [ ] Language declared (`<html lang="en">`)
|
||||
- [ ] Page has a descriptive `<title>`
|
||||
- [ ] Links distinguish from surrounding text (not by color alone)
|
||||
- [ ] Touch targets ≥ 44x44px on mobile
|
||||
- [ ] Meaningful empty states (not blank screens)
|
||||
|
||||
## Common HTML Patterns
|
||||
|
||||
### Buttons vs. Links
|
||||
|
||||
```html
|
||||
<!-- Use <button> for actions -->
|
||||
<button onClick={handleDelete}>Delete Task</button>
|
||||
|
||||
<!-- Use <a> for navigation -->
|
||||
<a href="/tasks/123">View Task</a>
|
||||
|
||||
<!-- NEVER use div/span as buttons -->
|
||||
<div onClick={handleDelete}>Delete</div> <!-- BAD -->
|
||||
```
|
||||
|
||||
### Form Labels
|
||||
|
||||
```html
|
||||
<!-- Explicit label association -->
|
||||
<label htmlFor="email">Email address</label>
|
||||
<input id="email" type="email" required />
|
||||
|
||||
<!-- Implicit wrapping -->
|
||||
<label>
|
||||
Email address
|
||||
<input type="email" required />
|
||||
</label>
|
||||
|
||||
<!-- Hidden label (visible label preferred) -->
|
||||
<input type="search" aria-label="Search tasks" />
|
||||
```
|
||||
|
||||
### ARIA Roles
|
||||
|
||||
```html
|
||||
<!-- Navigation -->
|
||||
<nav aria-label="Main navigation">...</nav>
|
||||
<nav aria-label="Footer links">...</nav>
|
||||
|
||||
<!-- Status messages -->
|
||||
<div role="status" aria-live="polite">Task saved</div>
|
||||
|
||||
<!-- Alert messages -->
|
||||
<div role="alert">Error: Title is required</div>
|
||||
|
||||
<!-- Modal dialogs -->
|
||||
<dialog aria-modal="true" aria-labelledby="dialog-title">
|
||||
<h2 id="dialog-title">Confirm Delete</h2>
|
||||
...
|
||||
</dialog>
|
||||
|
||||
<!-- Loading states -->
|
||||
<div aria-busy="true" aria-label="Loading tasks">
|
||||
<Spinner />
|
||||
</div>
|
||||
```
|
||||
|
||||
### Accessible Lists
|
||||
|
||||
```html
|
||||
<ul role="list" aria-label="Tasks">
|
||||
<li>
|
||||
<input type="checkbox" id="task-1" aria-label="Complete: Buy groceries" />
|
||||
<label htmlFor="task-1">Buy groceries</label>
|
||||
</li>
|
||||
</ul>
|
||||
```
|
||||
|
||||
## Testing Tools
|
||||
|
||||
```bash
|
||||
# Automated audit
|
||||
npx axe-core # Programmatic accessibility testing
|
||||
npx pa11y # CLI accessibility checker
|
||||
|
||||
# In browser
|
||||
# Chrome DevTools → Lighthouse → Accessibility
|
||||
# Chrome DevTools → Elements → Accessibility tree
|
||||
|
||||
# Screen reader testing
|
||||
# macOS: VoiceOver (Cmd + F5)
|
||||
# Windows: NVDA (free) or JAWS
|
||||
# Linux: Orca
|
||||
```
|
||||
|
||||
## Quick Reference: ARIA Live Regions
|
||||
|
||||
| Value | Behavior | Use For |
|
||||
|-------|----------|---------|
|
||||
| `aria-live="polite"` | Announced at next pause | Status updates, saved confirmations |
|
||||
| `aria-live="assertive"` | Announced immediately | Errors, time-sensitive alerts |
|
||||
| `role="status"` | Same as `polite` | Status messages |
|
||||
| `role="alert"` | Same as `assertive` | Error messages |
|
||||
|
||||
## Common Anti-Patterns
|
||||
|
||||
| Anti-Pattern | Problem | Fix |
|
||||
|---|---|---|
|
||||
| `div` as button | Not focusable, no keyboard support | Use `<button>` |
|
||||
| Missing `alt` text | Images invisible to screen readers | Add descriptive `alt` |
|
||||
| Color-only states | Invisible to color-blind users | Add icons, text, or patterns |
|
||||
| Autoplaying media | Disorienting, can't be stopped | Add controls, don't autoplay |
|
||||
| Custom dropdown with no ARIA | Unusable by keyboard/screen reader | Use native `<select>` or proper ARIA listbox |
|
||||
| Removing focus outlines | Users can't see where they are | Style outlines, don't remove them |
|
||||
| Empty links/buttons | "Link" announced with no description | Add text or `aria-label` |
|
||||
| `tabindex > 0` | Breaks natural tab order | Use `tabindex="0"` or `-1` only |
|
||||
@@ -0,0 +1,370 @@
|
||||
# Orchestration Patterns
|
||||
|
||||
Reference catalog of agent orchestration patterns this repo endorses, plus anti-patterns to avoid. Read this before adding a new slash command that coordinates multiple personas, or before introducing a new persona that "wraps" existing ones.
|
||||
|
||||
The governing rule: **the user (or a slash command) is the orchestrator. Personas do not invoke other personas.** Skills are mandatory hops inside a persona's workflow.
|
||||
|
||||
---
|
||||
|
||||
## Endorsed patterns
|
||||
|
||||
### 1. Direct invocation (no orchestration)
|
||||
|
||||
Single persona, single perspective, single artifact. The default and the cheapest option.
|
||||
|
||||
```
|
||||
user → code-reviewer → report → user
|
||||
```
|
||||
|
||||
**Use when:** the work is one perspective on one artifact and you can describe it in one sentence.
|
||||
|
||||
**Examples:**
|
||||
- "Review this PR" → `code-reviewer`
|
||||
- "Find security issues in `auth.ts`" → `security-auditor`
|
||||
- "What tests are missing for the checkout flow?" → `test-engineer`
|
||||
|
||||
**Cost:** one round trip. The baseline you should always compare orchestrated patterns against.
|
||||
|
||||
---
|
||||
|
||||
### 2. Single-persona slash command
|
||||
|
||||
A slash command that wraps one persona with the project's skills. Saves the user from re-explaining the workflow every time.
|
||||
|
||||
```
|
||||
/review → code-reviewer (with code-review-and-quality skill) → report
|
||||
```
|
||||
|
||||
**Use when:** the same single-persona invocation happens repeatedly with the same setup.
|
||||
|
||||
**Examples in this repo:** `/review`, `/test`, `/code-simplify`.
|
||||
|
||||
**Cost:** same as direct invocation. The slash command is just a saved prompt.
|
||||
|
||||
**Anti-signal:** if the slash command's body is mostly "decide which persona to call," delete it and let the user call the persona directly.
|
||||
|
||||
---
|
||||
|
||||
### 3. Parallel fan-out with merge
|
||||
|
||||
Multiple personas operate on the same input concurrently, each producing an independent report. A merge step (in the main agent's context) synthesizes them into a single decision.
|
||||
|
||||
```
|
||||
┌─→ code-reviewer ─┐
|
||||
/ship → fan out ───┼─→ security-auditor ─┤→ merge → go/no-go + rollback
|
||||
└─→ test-engineer ─┘
|
||||
```
|
||||
|
||||
**Use when:**
|
||||
- The sub-tasks are genuinely independent (no shared mutable state, no ordering dependency)
|
||||
- Each sub-agent benefits from its own context window
|
||||
- The merge step is small enough to stay in the main context
|
||||
- Wall-clock latency matters
|
||||
|
||||
**Examples in this repo:** `/ship`.
|
||||
|
||||
**Cost:** N parallel sub-agent contexts + one merge turn. Higher than direct invocation, but faster wall-clock and produces better reports because each sub-agent stays focused on its single perspective.
|
||||
|
||||
**Validation checklist before adopting this pattern:**
|
||||
- [ ] Can I run all sub-agents at the same time without ordering issues?
|
||||
- [ ] Does each persona produce a different *kind* of finding, not just the same finding from a different angle?
|
||||
- [ ] Will the merge step fit in the main agent's remaining context?
|
||||
- [ ] Is the user's wait time long enough that parallelism is actually noticeable?
|
||||
|
||||
If any answer is "no," fall back to direct invocation or a single-persona command.
|
||||
|
||||
---
|
||||
|
||||
### 4. Sequential pipeline as user-driven slash commands
|
||||
|
||||
The user runs slash commands in a defined order, carrying context (or commit history) between them. There is no orchestrator agent — the user IS the orchestrator.
|
||||
|
||||
```
|
||||
user runs: /spec → /plan → /build → /test → /review → /ship
|
||||
```
|
||||
|
||||
**Use when:** the workflow has dependencies (each step needs the previous step's output) and human judgment between steps adds value.
|
||||
|
||||
**Examples in this repo:** the entire DEFINE → PLAN → BUILD → VERIFY → REVIEW → SHIP lifecycle.
|
||||
|
||||
**Cost:** one sub-agent context per step. Free for the orchestration layer because there is no orchestrator agent.
|
||||
|
||||
**Why not automate it:** an LLM "lifecycle orchestrator" would (a) lose nuance between steps because it has to summarize for hand-off, (b) skip the human checkpoints that catch wrong-direction work early, and (c) double the token cost via paraphrasing turns.
|
||||
|
||||
---
|
||||
|
||||
### 5. Research isolation (context preservation)
|
||||
|
||||
When a task requires reading large amounts of material that shouldn't pollute the main context, spawn a research sub-agent that returns only a digest.
|
||||
|
||||
```
|
||||
main agent → research sub-agent (reads 50 files) → digest → main agent continues
|
||||
```
|
||||
|
||||
**Use when:**
|
||||
- The main session needs to stay focused on a downstream task
|
||||
- The investigation result is much smaller than the input it consumes
|
||||
- The decision quality benefits from the main agent having room to think after
|
||||
|
||||
**Examples:** "Find every call site of this deprecated API across the monorepo," "Summarize what these 30 ADRs say about caching."
|
||||
|
||||
**Cost:** one isolated sub-agent context. Worth it any time the alternative is loading hundreds of files into the main context.
|
||||
|
||||
**On Claude Code, use the built-in `Explore` subagent** rather than defining a custom research persona. `Explore` runs on Haiku, is denied write/edit tools, and is purpose-built for this pattern. Define a custom research subagent only when `Explore` doesn't fit (e.g. you need a domain-specific system prompt the model wouldn't infer).
|
||||
|
||||
---
|
||||
|
||||
## Claude Code compatibility
|
||||
|
||||
This catalog is harness-agnostic, but most readers will run it on Claude Code. Here's how each pattern maps onto Claude Code's primitives — and where the platform enforces our rules for us.
|
||||
|
||||
### Where personas live
|
||||
|
||||
Plugin subagents go in `agents/` at the plugin root. This repo is a plugin (`.claude-plugin/plugin.json`), so `agents/code-reviewer.md`, `agents/security-auditor.md`, and `agents/test-engineer.md` are auto-discovered when the plugin is enabled. No path configuration needed.
|
||||
|
||||
### Subagents vs. Agent Teams
|
||||
|
||||
Claude Code has two parallelism primitives. Pattern 3 (parallel fan-out with merge) maps to **subagents**. If you need teammates that talk to each other, use **Agent Teams** instead.
|
||||
|
||||
| | Subagents | Agent Teams |
|
||||
|--|-----------|-------------|
|
||||
| Coordination | Main agent fans out, sub-agents only report back | Teammates message each other, share a task list |
|
||||
| Context | Own context window per subagent | Own context window per teammate |
|
||||
| When to use | Independent tasks producing reports | Collaborative work needing discussion |
|
||||
| Status | Stable | Experimental — requires `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` |
|
||||
| Cost | Lower | Higher — each teammate is a separate Claude instance |
|
||||
|
||||
**The personas in this repo work in both modes.** When spawned as subagents (e.g. by `/ship`), they report findings to the main session. When spawned as teammates (`Spawn a teammate using the security-auditor agent type…`), they can challenge each other's findings directly. The persona definition is the same; only the spawning context changes.
|
||||
|
||||
One subtlety: the `skills` and `mcpServers` frontmatter fields in a persona are honored when it runs as a subagent but **ignored when it runs as a teammate** — teammates load skills and MCP servers from your project and user settings, the same as a regular session. If a persona depends on a specific skill or MCP server being loaded, configure it at the session level so it's available in both modes.
|
||||
|
||||
### Platform-enforced rules
|
||||
|
||||
Two rules in this catalog aren't just convention — Claude Code enforces them:
|
||||
|
||||
- **"Subagents cannot spawn other subagents"** (verbatim from the docs). Anti-pattern B (persona-calls-persona) and Anti-pattern D (deep persona trees) cannot exist on Claude Code by construction.
|
||||
- **"No nested teams"** — teammates cannot spawn their own teams. Same anti-patterns blocked at the team level.
|
||||
|
||||
This means you can adopt the patterns in this catalog without worrying about contributors accidentally building the anti-patterns. They'll just fail to load.
|
||||
|
||||
### Built-in subagents to know about
|
||||
|
||||
Before defining a custom subagent, check whether one of these covers the role:
|
||||
|
||||
| Built-in | Purpose |
|
||||
|----------|---------|
|
||||
| `Explore` | Read-only codebase search and analysis. Use this for Pattern 5 (research isolation). |
|
||||
| `Plan` | Read-only research during plan mode. |
|
||||
| `general-purpose` | Multi-step tasks needing both exploration and modification. |
|
||||
|
||||
Don't redefine these. Layer your specialist personas (code-reviewer, security-auditor, test-engineer) on top of them.
|
||||
|
||||
### Frontmatter restrictions for plugin agents
|
||||
|
||||
Plugin subagents do **not** support the `hooks`, `mcpServers`, or `permissionMode` frontmatter fields — these are silently ignored. If a future persona needs any of those, the user must copy the file into `.claude/agents/` or `~/.claude/agents/` instead.
|
||||
|
||||
The fields that DO work in plugin agents are: `name`, `description`, `tools`, `disallowedTools`, `model`, `maxTurns`, `skills`, `memory`, `background`, `effort`, `isolation`, `color`, `initialPrompt`. Use `model` per-persona if you want to optimize cost (e.g. Haiku for `test-engineer` coverage scans, Sonnet for `code-reviewer`, Opus for `security-auditor`).
|
||||
|
||||
### Spawning multiple subagents in parallel
|
||||
|
||||
In Claude Code, parallel fan-out (Pattern 3) requires issuing **multiple Agent tool calls in a single assistant turn**. Sequential turns serialize execution. `/ship` calls this out explicitly. Any new orchestrator command should do the same.
|
||||
|
||||
---
|
||||
|
||||
## Worked example: Agent Teams for competing-hypothesis debugging
|
||||
|
||||
This example shows when to reach for **Agent Teams** instead of `/ship`'s subagent fan-out. The two patterns look similar from a distance — both spawn the same three personas — but the value comes from a different place.
|
||||
|
||||
### The scenario
|
||||
|
||||
> *Checkout occasionally hangs for ~30 seconds before completing. It happens roughly once every 50 sessions. No errors in logs. Started after last week's release.*
|
||||
|
||||
Plausible root causes (mutually exclusive, all fit the symptoms):
|
||||
|
||||
1. A race condition in the new payment-confirmation flow
|
||||
2. An auth check that occasionally falls through to a slow synchronous network call
|
||||
3. A missing index on a query that scales with cart size
|
||||
4. A flaky third-party API where the SDK retries silently before timing out
|
||||
|
||||
A single agent will pick the first plausible theory and stop investigating. A `/ship`-style subagent fan-out would have each persona report independently — but their reports never meet, so nothing rules out the wrong theories.
|
||||
|
||||
This is exactly the case the Agent Teams docs describe: *"With multiple independent investigators actively trying to disprove each other, the theory that survives is much more likely to be the actual root cause."*
|
||||
|
||||
### Why this is *not* a `/ship` job
|
||||
|
||||
| | `/ship` (subagents) | Agent Teams |
|
||||
|--|--------------------|-------------|
|
||||
| Sub-agents see | The same diff, different lenses | A shared task list, each other's messages |
|
||||
| Output | Three independent reports → one merge | Adversarial debate → consensus root cause |
|
||||
| Right when | You want a verdict on a known artifact | You want to *find* the artifact among hypotheses |
|
||||
|
||||
`/ship` is a verdict; Agent Teams is an investigation.
|
||||
|
||||
### Setup (one-time, per-environment)
|
||||
|
||||
Agent Teams is experimental. In `~/.claude/settings.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"env": {
|
||||
"CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Requires Claude Code v2.1.32 or later. The personas in this repo are picked up automatically — no team-config files to author by hand.
|
||||
|
||||
### The trigger prompt
|
||||
|
||||
Type into the lead session, in natural language:
|
||||
|
||||
```
|
||||
Users report checkout hangs for ~30 seconds intermittently after last
|
||||
week's release. No errors in logs.
|
||||
|
||||
Create an agent team to debug this with competing hypotheses. Spawn
|
||||
three teammates using the existing agent types:
|
||||
|
||||
- code-reviewer — investigate race conditions and blocking calls
|
||||
in the checkout code path
|
||||
- security-auditor — investigate auth checks, session handling,
|
||||
and any synchronous network calls added recently
|
||||
- test-engineer — propose tests that would distinguish between the
|
||||
hypotheses and check coverage gaps in checkout
|
||||
|
||||
Have them message each other directly to challenge each other's
|
||||
theories. Update findings as consensus emerges. Only converge when
|
||||
two teammates agree they can disprove the others'.
|
||||
```
|
||||
|
||||
The lead spawns three teammates referencing the existing persona names. The persona body is **appended** to each teammate's system prompt as additional instructions (on top of the team-coordination instructions the lead installs); the trigger prompt above becomes their task.
|
||||
|
||||
### What happens
|
||||
|
||||
1. Each teammate runs in its own context window, exploring the codebase from its own lens.
|
||||
2. Teammates use `message` to send findings to each other directly. The lead doesn't have to relay.
|
||||
3. The shared task list shows who's investigating what — visible at any time with `Ctrl+T` (in-process mode) or in a tmux pane (split mode).
|
||||
4. When `code-reviewer` finds a `Promise.all` that should be sequential, it messages `security-auditor` to confirm the auth call isn't part of the race. `security-auditor` checks and replies — either confirming the race is the real issue or producing counter-evidence.
|
||||
5. `test-engineer` proposes a focused integration test for whichever theory is winning, which the team uses to verify before declaring consensus.
|
||||
6. The lead synthesizes the converged finding and presents it to you.
|
||||
|
||||
You can interrupt at any teammate by cycling with `Shift+Down` and typing — useful for redirecting an investigator who's gone down a wrong path.
|
||||
|
||||
### When to clean up
|
||||
|
||||
When the investigation lands on a root cause, tell the lead:
|
||||
|
||||
```
|
||||
Clean up the team
|
||||
```
|
||||
|
||||
Always cleanup through the lead, not a teammate (per the docs: teammates lack full team context for cleanup).
|
||||
|
||||
### Cost expectation
|
||||
|
||||
Three Sonnet teammates running for ~10–15 minutes of investigation costs noticeably more than the same three personas spawned as subagents by `/ship`. The justification is *quality of conclusion* — for production debugging where the wrong fix is expensive, the extra tokens are a bargain. For a routine PR review, stick with `/ship`.
|
||||
|
||||
### Anti-pattern in this scenario
|
||||
|
||||
Do **not** rebuild this as a `/debug` slash command that fans out subagents. Subagents can't message each other — you'd lose the adversarial debate that makes the pattern work. If a workflow keeps coming up, document the trigger prompt above as a snippet rather than wrapping it in a slash command that misuses subagents.
|
||||
|
||||
### When *not* to use Agent Teams
|
||||
|
||||
- Production-bound verdict on a known diff → use `/ship` (subagents).
|
||||
- One specialist perspective on one artifact → direct persona invocation.
|
||||
- Sequential lifecycle (spec → plan → build) → user-driven slash commands (Pattern 4).
|
||||
- Read-heavy research with a small digest → built-in `Explore` subagent.
|
||||
|
||||
Reach for Agent Teams only when teammates **need** to challenge each other to produce the right answer.
|
||||
|
||||
---
|
||||
|
||||
## Anti-patterns
|
||||
|
||||
### A. Router persona ("meta-orchestrator")
|
||||
|
||||
A persona whose job is to decide which other persona to call.
|
||||
|
||||
```
|
||||
/work → router-persona → "this needs a review" → code-reviewer → router (paraphrases) → user
|
||||
```
|
||||
|
||||
**Why it fails:**
|
||||
- Pure routing layer with no domain value
|
||||
- Adds two paraphrasing hops → information loss + roughly 2× token cost
|
||||
- The user already knew they wanted a review; they could have called `/review` directly
|
||||
- Replicates the work that slash commands and intent mapping in `AGENTS.md` already do
|
||||
|
||||
**What to do instead:** add or refine slash commands. Document intent → command mapping in `AGENTS.md`.
|
||||
|
||||
---
|
||||
|
||||
### B. Persona that calls another persona
|
||||
|
||||
A `code-reviewer` that internally invokes `security-auditor` when it sees auth code.
|
||||
|
||||
**Why it fails:**
|
||||
- Personas were designed to produce a single perspective; chaining them defeats that
|
||||
- The summary the calling persona passes loses context the called persona needs
|
||||
- Failure modes multiply (which persona's output format wins? whose rules apply?)
|
||||
- Hides cost from the user
|
||||
|
||||
**What to do instead:** have the calling persona *recommend* a follow-up audit in its report. The user or a slash command runs the second pass.
|
||||
|
||||
---
|
||||
|
||||
### C. Sequential orchestrator that paraphrases
|
||||
|
||||
An agent that calls `/spec`, then `/plan`, then `/build`, etc. on the user's behalf.
|
||||
|
||||
**Why it fails:**
|
||||
- Loses the human checkpoints that catch wrong-direction work
|
||||
- Each hand-off summarizes context — accumulated drift over a long pipeline
|
||||
- Doubles token cost: orchestrator turn + sub-agent turn for every step
|
||||
- Removes user agency at exactly the points where judgment matters most
|
||||
|
||||
**What to do instead:** keep the user as the orchestrator. Document the recommended sequence in `README.md` and let users invoke it.
|
||||
|
||||
---
|
||||
|
||||
### D. Deep persona trees
|
||||
|
||||
`/ship` calls a `pre-ship-coordinator` that calls a `quality-coordinator` that calls `code-reviewer`.
|
||||
|
||||
**Why it fails:**
|
||||
- Each layer adds latency and tokens with no decision value
|
||||
- Debugging becomes a multi-level investigation
|
||||
- The leaf personas lose context to multiple summarization steps
|
||||
|
||||
**What to do instead:** keep the orchestration depth at most 1 (slash command → personas). The merge happens in the main agent.
|
||||
|
||||
---
|
||||
|
||||
## Decision flow
|
||||
|
||||
When considering a new orchestrated workflow, walk this flow:
|
||||
|
||||
```
|
||||
Is the work one perspective on one artifact?
|
||||
├── Yes → Direct invocation. Stop.
|
||||
└── No → Will the same composition repeat?
|
||||
├── No → Direct invocation, ad hoc. Stop.
|
||||
└── Yes → Are sub-tasks independent?
|
||||
├── No → Sequential slash commands run by user (Pattern 4).
|
||||
└── Yes → Parallel fan-out with merge (Pattern 3).
|
||||
Validate against the checklist above.
|
||||
If any check fails → fall back to single-persona command (Pattern 2).
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## When to add a new pattern to this catalog
|
||||
|
||||
Add a new entry only after:
|
||||
|
||||
1. You've used the pattern at least twice in real work
|
||||
2. You can name a concrete artifact in this repo that demonstrates it
|
||||
3. You can explain why an existing pattern wouldn't have worked
|
||||
4. You can describe its anti-pattern shadow (what people will mistakenly build instead)
|
||||
|
||||
Premature catalog entries become aspirational documentation that no one follows.
|
||||
@@ -0,0 +1,153 @@
|
||||
# Performance Checklist
|
||||
|
||||
Quick reference checklist for web application performance. Use alongside the `performance-optimization` skill.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Core Web Vitals Targets](#core-web-vitals-targets)
|
||||
- [TTFB Diagnosis](#ttfb-diagnosis)
|
||||
- [Frontend Checklist](#frontend-checklist)
|
||||
- [Backend Checklist](#backend-checklist)
|
||||
- [Measurement Commands](#measurement-commands)
|
||||
- [Common Anti-Patterns](#common-anti-patterns)
|
||||
|
||||
## Core Web Vitals Targets
|
||||
|
||||
| Metric | Good | Needs Work | Poor |
|
||||
|--------|------|------------|------|
|
||||
| LCP (Largest Contentful Paint) | ≤ 2.5s | ≤ 4.0s | > 4.0s |
|
||||
| INP (Interaction to Next Paint) | ≤ 200ms | ≤ 500ms | > 500ms |
|
||||
| CLS (Cumulative Layout Shift) | ≤ 0.1 | ≤ 0.25 | > 0.25 |
|
||||
|
||||
## TTFB Diagnosis
|
||||
|
||||
When TTFB is slow (> 800ms), check each component in DevTools Network waterfall:
|
||||
|
||||
- [ ] **DNS resolution** slow → add `<link rel="dns-prefetch">` or `<link rel="preconnect">` for known origins
|
||||
- [ ] **TCP/TLS handshake** slow → enable HTTP/2, consider edge deployment, verify keep-alive
|
||||
- [ ] **Server processing** slow → profile backend, check slow queries, add caching
|
||||
|
||||
## Frontend Checklist
|
||||
|
||||
### Images
|
||||
- [ ] Images use modern formats (WebP, AVIF)
|
||||
- [ ] Images are responsively sized (`srcset` and `sizes`)
|
||||
- [ ] Images and `<source>` elements have explicit `width` and `height` (prevents CLS in art direction)
|
||||
- [ ] Below-the-fold images use `loading="lazy"` and `decoding="async"`
|
||||
- [ ] Hero/LCP images use `fetchpriority="high"` and no lazy loading
|
||||
|
||||
### JavaScript
|
||||
- [ ] Bundle size under 200KB gzipped (initial load)
|
||||
- [ ] Code splitting with dynamic `import()` for routes and heavy features
|
||||
- [ ] Tree shaking enabled (verify dependency ships ESM and marks `sideEffects: false`)
|
||||
- [ ] No blocking JavaScript in `<head>` (use `defer` or `async`)
|
||||
- [ ] Heavy computation offloaded to Web Workers (if applicable)
|
||||
- [ ] `React.memo()` on expensive components that re-render with same props
|
||||
- [ ] `useMemo()` / `useCallback()` only where profiling shows benefit
|
||||
- [ ] Long tasks (> 50ms) broken up to keep the main thread available — main lever for INP
|
||||
- [ ] `yieldToMain` pattern used inside long-running loops so input events can run between chunks
|
||||
- [ ] Modern scheduling APIs used where available: `scheduler.yield()` (preferred), `scheduler.postTask()` with priorities, `isInputPending()` to yield only when needed
|
||||
- [ ] `requestIdleCallback` for deferrable, non-urgent work (analytics flush, prefetch, warmup)
|
||||
- [ ] Non-critical work deferred out of event handlers (e.g. analytics, logging) so the response to the interaction is not delayed
|
||||
- [ ] Third-party scripts loaded with `async` / `defer`, audited for size, and fronted by a facade when heavy (chat widgets, embeds)
|
||||
|
||||
### CSS
|
||||
- [ ] Critical CSS inlined or preloaded
|
||||
- [ ] No render-blocking CSS for non-critical styles
|
||||
- [ ] No CSS-in-JS runtime cost in production (use extraction)
|
||||
|
||||
### Fonts
|
||||
- [ ] Limited to 2–3 font families, 2–3 weights each (every additional weight is another request)
|
||||
- [ ] WOFF2 format only (smallest, universal support — skip WOFF/TTF/EOT)
|
||||
- [ ] Self-hosted when possible (third-party font CDNs add DNS + TCP + TLS round-trips)
|
||||
- [ ] LCP-critical fonts preloaded: `<link rel="preload" as="font" type="font/woff2" crossorigin>`
|
||||
- [ ] `font-display: swap` (or `optional` for non-critical) to avoid FOIT blocking render
|
||||
- [ ] Subsetted via `unicode-range` to ship only the glyphs each page needs
|
||||
- [ ] Variable fonts considered when multiple weights/styles are required (one file replaces many)
|
||||
- [ ] Fallback font metrics adjusted with `size-adjust`, `ascent-override`, `descent-override` to reduce CLS on font swap
|
||||
- [ ] System font stack considered before any custom font
|
||||
|
||||
### Network
|
||||
- [ ] Static assets cached with long `max-age` + content hashing
|
||||
- [ ] API responses cached where appropriate (`Cache-Control`)
|
||||
- [ ] HTTP/2 or HTTP/3 enabled
|
||||
- [ ] Resources preconnected (`<link rel="preconnect">`) for known origins
|
||||
- [ ] `fetchpriority` used on critical non-image resources (e.g., key `<link rel="preload">`, above-the-fold `<script>`) — not only on `<img>`
|
||||
- [ ] No unnecessary redirects
|
||||
|
||||
### Rendering
|
||||
- [ ] No layout thrashing (forced synchronous layouts)
|
||||
- [ ] Animations use `transform` and `opacity` (GPU-accelerated)
|
||||
- [ ] Long lists use virtualization (e.g., `react-window`)
|
||||
- [ ] No unnecessary full-page re-renders
|
||||
- [ ] Off-screen sections use `content-visibility: auto` with `contain-intrinsic-size` to skip layout/paint of non-visible areas
|
||||
- [ ] No `unload` event handlers and no `Cache-Control: no-store` on HTML responses — preserves back/forward cache (bfcache) eligibility
|
||||
|
||||
## Backend Checklist
|
||||
|
||||
### Database
|
||||
- [ ] No N+1 query patterns (use eager loading / joins)
|
||||
- [ ] Queries have appropriate indexes
|
||||
- [ ] List endpoints paginated (never `SELECT * FROM table`)
|
||||
- [ ] Connection pooling configured
|
||||
- [ ] Slow query logging enabled
|
||||
|
||||
### API
|
||||
- [ ] Response times < 200ms (p95)
|
||||
- [ ] No synchronous heavy computation in request handlers
|
||||
- [ ] Bulk operations instead of loops of individual calls
|
||||
- [ ] Response compression (gzip/brotli)
|
||||
- [ ] Appropriate caching (in-memory, Redis, CDN)
|
||||
|
||||
### Infrastructure
|
||||
- [ ] CDN for static assets
|
||||
- [ ] Server located close to users (or edge deployment)
|
||||
- [ ] Horizontal scaling configured (if needed)
|
||||
- [ ] Health check endpoint for load balancer
|
||||
|
||||
## Measurement Commands
|
||||
|
||||
### INP field data and DevTools workflow
|
||||
|
||||
1. **Field data first** — check [CrUX Vis](https://developer.chrome.com/docs/crux/vis) or your RUM tool for real-user INP before optimising
|
||||
2. **Identify slow interactions** — open DevTools → Performance panel → record while interacting; look for long tasks triggered by clicks/keystrokes
|
||||
3. **Test on mid-range Android** — INP issues often only surface on slower hardware; use a real device or DevTools CPU throttling (4×–6× slowdown)
|
||||
|
||||
```bash
|
||||
# Lighthouse CLI
|
||||
npx lighthouse https://localhost:3000 --output json --output-path ./report.json
|
||||
|
||||
# Bundle analysis
|
||||
npx webpack-bundle-analyzer stats.json
|
||||
# or for Vite:
|
||||
npx vite-bundle-visualizer
|
||||
|
||||
# Check bundle size
|
||||
npx bundlesize
|
||||
|
||||
# Web Vitals in code
|
||||
import { onLCP, onINP, onCLS } from 'web-vitals';
|
||||
onLCP(console.log);
|
||||
onINP(console.log);
|
||||
onCLS(console.log);
|
||||
|
||||
# INP with interaction-level detail (attribution build)
|
||||
import { onINP } from 'web-vitals/attribution';
|
||||
onINP(({ value, attribution }) => {
|
||||
const { interactionTarget, inputDelay, processingDuration, presentationDelay } = attribution;
|
||||
console.log({ value, interactionTarget, inputDelay, processingDuration, presentationDelay });
|
||||
});
|
||||
```
|
||||
|
||||
## Common Anti-Patterns
|
||||
|
||||
| Anti-Pattern | Impact | Fix |
|
||||
|---|---|---|
|
||||
| N+1 queries | Linear DB load growth | Use joins, includes, or batch loading |
|
||||
| Unbounded queries | Memory exhaustion, timeouts | Always paginate, add LIMIT |
|
||||
| Missing indexes | Slow reads as data grows | Add indexes for filtered/sorted columns |
|
||||
| Layout thrashing | Jank, dropped frames | Batch DOM reads, then batch writes |
|
||||
| Unoptimized images | Slow LCP, wasted bandwidth | Use WebP, responsive sizes, lazy load |
|
||||
| Large bundles | Slow Time to Interactive | Code split, tree shake, audit deps |
|
||||
| Blocking main thread | Poor INP, unresponsive UI | Chunk long tasks with `scheduler.yield()` / `yieldToMain`, offload to Web Workers |
|
||||
| Memory leaks | Growing memory, eventual crash | Clean up listeners, intervals, refs |
|
||||
@@ -0,0 +1,134 @@
|
||||
# Security Checklist
|
||||
|
||||
Quick reference for web application security. Use alongside the `security-and-hardening` skill.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Pre-Commit Checks](#pre-commit-checks)
|
||||
- [Authentication](#authentication)
|
||||
- [Authorization](#authorization)
|
||||
- [Input Validation](#input-validation)
|
||||
- [Security Headers](#security-headers)
|
||||
- [CORS Configuration](#cors-configuration)
|
||||
- [Data Protection](#data-protection)
|
||||
- [Dependency Security](#dependency-security)
|
||||
- [Error Handling](#error-handling)
|
||||
- [OWASP Top 10 Quick Reference](#owasp-top-10-quick-reference)
|
||||
|
||||
## Pre-Commit Checks
|
||||
|
||||
- [ ] No secrets in code (`git diff --cached | grep -i "password\|secret\|api_key\|token"`)
|
||||
- [ ] `.gitignore` covers: `.env`, `.env.local`, `*.pem`, `*.key`
|
||||
- [ ] `.env.example` uses placeholder values (not real secrets)
|
||||
|
||||
## Authentication
|
||||
|
||||
- [ ] Passwords hashed with bcrypt (≥12 rounds), scrypt, or argon2
|
||||
- [ ] Session cookies: `httpOnly`, `secure`, `sameSite: 'lax'`
|
||||
- [ ] Session expiration configured (reasonable max-age)
|
||||
- [ ] Rate limiting on login endpoint (≤10 attempts per 15 minutes)
|
||||
- [ ] Password reset tokens: time-limited (≤1 hour), single-use
|
||||
- [ ] Account lockout after repeated failures (optional, with notification)
|
||||
- [ ] MFA supported for sensitive operations (optional but recommended)
|
||||
|
||||
## Authorization
|
||||
|
||||
- [ ] Every protected endpoint checks authentication
|
||||
- [ ] Every resource access checks ownership/role (prevents IDOR)
|
||||
- [ ] Admin endpoints require admin role verification
|
||||
- [ ] API keys scoped to minimum necessary permissions
|
||||
- [ ] JWT tokens validated (signature, expiration, issuer)
|
||||
|
||||
## Input Validation
|
||||
|
||||
- [ ] All user input validated at system boundaries (API routes, form handlers)
|
||||
- [ ] Validation uses allowlists (not denylists)
|
||||
- [ ] String lengths constrained (min/max)
|
||||
- [ ] Numeric ranges validated
|
||||
- [ ] Email, URL, and date formats validated with proper libraries
|
||||
- [ ] File uploads: type restricted, size limited, content verified
|
||||
- [ ] SQL queries parameterized (no string concatenation)
|
||||
- [ ] HTML output encoded (use framework auto-escaping)
|
||||
- [ ] URLs validated before redirect (prevent open redirect)
|
||||
|
||||
## Security Headers
|
||||
|
||||
```
|
||||
Content-Security-Policy: default-src 'self'; script-src 'self'
|
||||
Strict-Transport-Security: max-age=31536000; includeSubDomains
|
||||
X-Content-Type-Options: nosniff
|
||||
X-Frame-Options: DENY
|
||||
X-XSS-Protection: 0 (disabled, rely on CSP)
|
||||
Referrer-Policy: strict-origin-when-cross-origin
|
||||
Permissions-Policy: camera=(), microphone=(), geolocation=()
|
||||
```
|
||||
|
||||
## CORS Configuration
|
||||
|
||||
```typescript
|
||||
// Restrictive (recommended)
|
||||
cors({
|
||||
origin: ['https://yourdomain.com', 'https://app.yourdomain.com'],
|
||||
credentials: true,
|
||||
methods: ['GET', 'POST', 'PUT', 'PATCH', 'DELETE'],
|
||||
allowedHeaders: ['Content-Type', 'Authorization'],
|
||||
})
|
||||
|
||||
// NEVER use in production:
|
||||
cors({ origin: '*' }) // Allows any origin
|
||||
```
|
||||
|
||||
## Data Protection
|
||||
|
||||
- [ ] Sensitive fields excluded from API responses (`passwordHash`, `resetToken`, etc.)
|
||||
- [ ] Sensitive data not logged (passwords, tokens, full CC numbers)
|
||||
- [ ] PII encrypted at rest (if required by regulation)
|
||||
- [ ] HTTPS for all external communication
|
||||
- [ ] Database backups encrypted
|
||||
|
||||
## Dependency Security
|
||||
|
||||
```bash
|
||||
# Audit dependencies
|
||||
npm audit
|
||||
|
||||
# Fix automatically where possible
|
||||
npm audit fix
|
||||
|
||||
# Check for critical vulnerabilities
|
||||
npm audit --audit-level=critical
|
||||
|
||||
# Keep dependencies updated
|
||||
npx npm-check-updates
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
```typescript
|
||||
// Production: generic error, no internals
|
||||
res.status(500).json({
|
||||
error: { code: 'INTERNAL_ERROR', message: 'Something went wrong' }
|
||||
});
|
||||
|
||||
// NEVER in production:
|
||||
res.status(500).json({
|
||||
error: err.message,
|
||||
stack: err.stack, // Exposes internals
|
||||
query: err.sql, // Exposes database details
|
||||
});
|
||||
```
|
||||
|
||||
## OWASP Top 10 Quick Reference
|
||||
|
||||
| # | Vulnerability | Prevention |
|
||||
|---|---|---|
|
||||
| 1 | Broken Access Control | Auth checks on every endpoint, ownership verification |
|
||||
| 2 | Cryptographic Failures | HTTPS, strong hashing, no secrets in code |
|
||||
| 3 | Injection | Parameterized queries, input validation |
|
||||
| 4 | Insecure Design | Threat modeling, spec-driven development |
|
||||
| 5 | Security Misconfiguration | Security headers, minimal permissions, audit deps |
|
||||
| 6 | Vulnerable Components | `npm audit`, keep deps updated, minimal deps |
|
||||
| 7 | Auth Failures | Strong passwords, rate limiting, session management |
|
||||
| 8 | Data Integrity Failures | Verify updates/dependencies, signed artifacts |
|
||||
| 9 | Logging Failures | Log security events, don't log secrets |
|
||||
| 10 | SSRF | Validate/allowlist URLs, restrict outbound requests |
|
||||
@@ -0,0 +1,236 @@
|
||||
# Testing Patterns Reference
|
||||
|
||||
Quick reference for common testing patterns across the stack. Use alongside the `test-driven-development` skill.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Test Structure (Arrange-Act-Assert)](#test-structure-arrange-act-assert)
|
||||
- [Test Naming Conventions](#test-naming-conventions)
|
||||
- [Common Assertions](#common-assertions)
|
||||
- [Mocking Patterns](#mocking-patterns)
|
||||
- [React/Component Testing](#reactcomponent-testing)
|
||||
- [API / Integration Testing](#api--integration-testing)
|
||||
- [E2E Testing (Playwright)](#e2e-testing-playwright)
|
||||
- [Test Anti-Patterns](#test-anti-patterns)
|
||||
|
||||
## Test Structure (Arrange-Act-Assert)
|
||||
|
||||
```typescript
|
||||
it('describes expected behavior', () => {
|
||||
// Arrange: Set up test data and preconditions
|
||||
const input = { title: 'Test Task', priority: 'high' };
|
||||
|
||||
// Act: Perform the action being tested
|
||||
const result = createTask(input);
|
||||
|
||||
// Assert: Verify the outcome
|
||||
expect(result.title).toBe('Test Task');
|
||||
expect(result.priority).toBe('high');
|
||||
expect(result.status).toBe('pending');
|
||||
});
|
||||
```
|
||||
|
||||
## Test Naming Conventions
|
||||
|
||||
```typescript
|
||||
// Pattern: [unit] [expected behavior] [condition]
|
||||
describe('TaskService.createTask', () => {
|
||||
it('creates a task with default pending status', () => {});
|
||||
it('throws ValidationError when title is empty', () => {});
|
||||
it('trims whitespace from title', () => {});
|
||||
it('generates a unique ID for each task', () => {});
|
||||
});
|
||||
```
|
||||
|
||||
## Common Assertions
|
||||
|
||||
```typescript
|
||||
// Equality
|
||||
expect(result).toBe(expected); // Strict equality (===)
|
||||
expect(result).toEqual(expected); // Deep equality (objects/arrays)
|
||||
expect(result).toStrictEqual(expected); // Deep equality + type matching
|
||||
|
||||
// Truthiness
|
||||
expect(result).toBeTruthy();
|
||||
expect(result).toBeFalsy();
|
||||
expect(result).toBeNull();
|
||||
expect(result).toBeDefined();
|
||||
expect(result).toBeUndefined();
|
||||
|
||||
// Numbers
|
||||
expect(result).toBeGreaterThan(5);
|
||||
expect(result).toBeLessThanOrEqual(10);
|
||||
expect(result).toBeCloseTo(0.3, 5); // Floating point
|
||||
|
||||
// Strings
|
||||
expect(result).toMatch(/pattern/);
|
||||
expect(result).toContain('substring');
|
||||
|
||||
// Arrays / Objects
|
||||
expect(array).toContain(item);
|
||||
expect(array).toHaveLength(3);
|
||||
expect(object).toHaveProperty('key', 'value');
|
||||
|
||||
// Errors
|
||||
expect(() => fn()).toThrow();
|
||||
expect(() => fn()).toThrow(ValidationError);
|
||||
expect(() => fn()).toThrow('specific message');
|
||||
|
||||
// Async
|
||||
await expect(asyncFn()).resolves.toBe(value);
|
||||
await expect(asyncFn()).rejects.toThrow(Error);
|
||||
```
|
||||
|
||||
## Mocking Patterns
|
||||
|
||||
### Mock Functions
|
||||
|
||||
```typescript
|
||||
const mockFn = jest.fn();
|
||||
mockFn.mockReturnValue(42);
|
||||
mockFn.mockResolvedValue({ data: 'test' });
|
||||
mockFn.mockImplementation((x) => x * 2);
|
||||
|
||||
expect(mockFn).toHaveBeenCalled();
|
||||
expect(mockFn).toHaveBeenCalledWith('arg1', 'arg2');
|
||||
expect(mockFn).toHaveBeenCalledTimes(3);
|
||||
```
|
||||
|
||||
### Mock Modules
|
||||
|
||||
```typescript
|
||||
// Mock an entire module
|
||||
jest.mock('./database', () => ({
|
||||
query: jest.fn().mockResolvedValue([{ id: 1, title: 'Test' }]),
|
||||
}));
|
||||
|
||||
// Mock specific exports
|
||||
jest.mock('./utils', () => ({
|
||||
...jest.requireActual('./utils'),
|
||||
generateId: jest.fn().mockReturnValue('test-id'),
|
||||
}));
|
||||
```
|
||||
|
||||
### Mock at Boundaries Only
|
||||
|
||||
```
|
||||
Mock these: Don't mock these:
|
||||
├── Database calls ├── Internal utility functions
|
||||
├── HTTP requests ├── Business logic
|
||||
├── File system operations ├── Data transformations
|
||||
├── External API calls ├── Validation functions
|
||||
└── Time/Date (when needed) └── Pure functions
|
||||
```
|
||||
|
||||
## React/Component Testing
|
||||
|
||||
```tsx
|
||||
import { render, screen, fireEvent, waitFor } from '@testing-library/react';
|
||||
|
||||
describe('TaskForm', () => {
|
||||
it('submits the form with entered data', async () => {
|
||||
const onSubmit = jest.fn();
|
||||
render(<TaskForm onSubmit={onSubmit} />);
|
||||
|
||||
// Find elements by accessible role/label (not test IDs)
|
||||
await screen.findByRole('textbox', { name: /title/i });
|
||||
fireEvent.change(screen.getByRole('textbox', { name: /title/i }), {
|
||||
target: { value: 'New Task' },
|
||||
});
|
||||
fireEvent.click(screen.getByRole('button', { name: /create/i }));
|
||||
|
||||
await waitFor(() => {
|
||||
expect(onSubmit).toHaveBeenCalledWith({ title: 'New Task' });
|
||||
});
|
||||
});
|
||||
|
||||
it('shows validation error for empty title', async () => {
|
||||
render(<TaskForm onSubmit={jest.fn()} />);
|
||||
|
||||
fireEvent.click(screen.getByRole('button', { name: /create/i }));
|
||||
|
||||
expect(await screen.findByText(/title is required/i)).toBeInTheDocument();
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
## API / Integration Testing
|
||||
|
||||
```typescript
|
||||
import request from 'supertest';
|
||||
import { app } from '../src/app';
|
||||
|
||||
describe('POST /api/tasks', () => {
|
||||
it('creates a task and returns 201', async () => {
|
||||
const response = await request(app)
|
||||
.post('/api/tasks')
|
||||
.send({ title: 'Test Task' })
|
||||
.set('Authorization', `Bearer ${testToken}`)
|
||||
.expect(201);
|
||||
|
||||
expect(response.body).toMatchObject({
|
||||
id: expect.any(String),
|
||||
title: 'Test Task',
|
||||
status: 'pending',
|
||||
});
|
||||
});
|
||||
|
||||
it('returns 422 for invalid input', async () => {
|
||||
const response = await request(app)
|
||||
.post('/api/tasks')
|
||||
.send({ title: '' })
|
||||
.set('Authorization', `Bearer ${testToken}`)
|
||||
.expect(422);
|
||||
|
||||
expect(response.body.error.code).toBe('VALIDATION_ERROR');
|
||||
});
|
||||
|
||||
it('returns 401 without authentication', async () => {
|
||||
await request(app)
|
||||
.post('/api/tasks')
|
||||
.send({ title: 'Test' })
|
||||
.expect(401);
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
## E2E Testing (Playwright)
|
||||
|
||||
```typescript
|
||||
import { test, expect } from '@playwright/test';
|
||||
|
||||
test('user can create and complete a task', async ({ page }) => {
|
||||
// Navigate and authenticate
|
||||
await page.goto('/');
|
||||
await page.fill('[name="email"]', 'test@example.com');
|
||||
await page.fill('[name="password"]', 'testpass123');
|
||||
await page.click('button:has-text("Log in")');
|
||||
|
||||
// Create a task
|
||||
await page.click('button:has-text("New Task")');
|
||||
await page.fill('[name="title"]', 'Buy groceries');
|
||||
await page.click('button:has-text("Create")');
|
||||
|
||||
// Verify task appears
|
||||
await expect(page.locator('text=Buy groceries')).toBeVisible();
|
||||
|
||||
// Complete the task
|
||||
await page.click('[aria-label="Complete Buy groceries"]');
|
||||
await expect(page.locator('text=Buy groceries')).toHaveCSS(
|
||||
'text-decoration-line', 'line-through'
|
||||
);
|
||||
});
|
||||
```
|
||||
|
||||
## Test Anti-Patterns
|
||||
|
||||
| Anti-Pattern | Problem | Better Approach |
|
||||
|---|---|---|
|
||||
| Testing implementation details | Breaks on refactor | Test inputs/outputs |
|
||||
| Snapshot everything | No one reviews snapshot diffs | Assert specific values |
|
||||
| Shared mutable state | Tests pollute each other | Setup/teardown per test |
|
||||
| Testing third-party code | Wastes time, not your bug | Mock the boundary |
|
||||
| Skipping tests to pass CI | Hides real bugs | Fix or delete the test |
|
||||
| Using `test.skip` permanently | Dead code | Remove or fix it |
|
||||
| Overly broad assertions | Doesn't catch regressions | Be specific |
|
||||
| No async error handling | Swallowed errors, false passes | Always `await` async tests |
|
||||
Reference in New Issue
Block a user