Initial commit

2026-05-16 13:43:29 +07:00
commit 688a86b611
208 changed files with 44350 additions and 0 deletions
--- a/.claude/skills/shipping-and-launch/SKILL.md
+++ b/.claude/skills/shipping-and-launch/SKILL.md
@@ -0,0 +1,309 @@
+---
+name: shipping-and-launch
+description: Prepares production launches. Use when preparing to deploy to production. Use when you need a pre-launch checklist, when setting up monitoring, when planning a staged rollout, or when you need a rollback strategy.
+---
+
+# Shipping and Launch
+
+## Overview
+
+Ship with confidence. The goal is not just to deploy — it's to deploy safely, with monitoring in place, a rollback plan ready, and a clear understanding of what success looks like. Every launch should be reversible, observable, and incremental.
+
+## When to Use
+
+- Deploying a feature to production for the first time
+- Releasing a significant change to users
+- Migrating data or infrastructure
+- Opening a beta or early access program
+- Any deployment that carries risk (all of them)
+
+## The Pre-Launch Checklist
+
+### Code Quality
+
+- [ ] All tests pass (unit, integration, e2e)
+- [ ] Build succeeds with no warnings
+- [ ] Lint and type checking pass
+- [ ] Code reviewed and approved
+- [ ] No TODO comments that should be resolved before launch
+- [ ] No `console.log` debugging statements in production code
+- [ ] Error handling covers expected failure modes
+
+### Security
+
+- [ ] No secrets in code or version control
+- [ ] `npm audit` shows no critical or high vulnerabilities
+- [ ] Input validation on all user-facing endpoints
+- [ ] Authentication and authorization checks in place
+- [ ] Security headers configured (CSP, HSTS, etc.)
+- [ ] Rate limiting on authentication endpoints
+- [ ] CORS configured to specific origins (not wildcard)
+
+### Performance
+
+- [ ] Core Web Vitals within "Good" thresholds
+- [ ] No N+1 queries in critical paths
+- [ ] Images optimized (compression, responsive sizes, lazy loading)
+- [ ] Bundle size within budget
+- [ ] Database queries have appropriate indexes
+- [ ] Caching configured for static assets and repeated queries
+
+### Accessibility
+
+- [ ] Keyboard navigation works for all interactive elements
+- [ ] Screen reader can convey page content and structure
+- [ ] Color contrast meets WCAG 2.1 AA (4.5:1 for text)
+- [ ] Focus management correct for modals and dynamic content
+- [ ] Error messages are descriptive and associated with form fields
+- [ ] No accessibility warnings in axe-core or Lighthouse
+
+### Infrastructure
+
+- [ ] Environment variables set in production
+- [ ] Database migrations applied (or ready to apply)
+- [ ] DNS and SSL configured
+- [ ] CDN configured for static assets
+- [ ] Logging and error reporting configured
+- [ ] Health check endpoint exists and responds
+
+### Documentation
+
+- [ ] README updated with any new setup requirements
+- [ ] API documentation current
+- [ ] ADRs written for any architectural decisions
+- [ ] Changelog updated
+- [ ] User-facing documentation updated (if applicable)
+
+## Feature Flag Strategy
+
+Ship behind feature flags to decouple deployment from release:
+
+```typescript
+// Feature flag check
+const flags = await getFeatureFlags(userId);
+
+if (flags.taskSharing) {
+  // New feature: task sharing
+  return <TaskSharingPanel task={task} />;
+}
+
+// Default: existing behavior
+return null;
+```
+
+**Feature flag lifecycle:**
+
+```
+1. DEPLOY with flag OFF     → Code is in production but inactive
+2. ENABLE for team/beta     → Internal testing in production environment
+3. GRADUAL ROLLOUT          → 5% → 25% → 50% → 100% of users
+4. MONITOR at each stage    → Watch error rates, performance, user feedback
+5. CLEAN UP                 → Remove flag and dead code path after full rollout
+```
+
+**Rules:**
+- Every feature flag has an owner and an expiration date
+- Clean up flags within 2 weeks of full rollout
+- Don't nest feature flags (creates exponential combinations)
+- Test both flag states (on and off) in CI
+
+## Staged Rollout
+
+### The Rollout Sequence
+
+```
+1. DEPLOY to staging
+   └── Full test suite in staging environment
+   └── Manual smoke test of critical flows
+
+2. DEPLOY to production (feature flag OFF)
+   └── Verify deployment succeeded (health check)
+   └── Check error monitoring (no new errors)
+
+3. ENABLE for team (flag ON for internal users)
+   └── Team uses the feature in production
+   └── 24-hour monitoring window
+
+4. CANARY rollout (flag ON for 5% of users)
+   └── Monitor error rates, latency, user behavior
+   └── Compare metrics: canary vs. baseline
+   └── 24-48 hour monitoring window
+   └── Advance only if all thresholds pass (see table below)
+
+5. GRADUAL increase (25% -> 50% -> 100%)
+   └── Same monitoring at each step
+   └── Ability to roll back to previous percentage at any point
+
+6. FULL rollout (flag ON for all users)
+   └── Monitor for 1 week
+   └── Clean up feature flag
+```
+
+### Rollout Decision Thresholds
+
+Use these thresholds to decide whether to advance, hold, or roll back at each stage:
+
+| Metric | Advance (green) | Hold and investigate (yellow) | Roll back (red) |
+|--------|-----------------|-------------------------------|-----------------|
+| Error rate | Within 10% of baseline | 10-100% above baseline | >2x baseline |
+| P95 latency | Within 20% of baseline | 20-50% above baseline | >50% above baseline |
+| Client JS errors | No new error types | New errors at <0.1% of sessions | New errors at >0.1% of sessions |
+| Business metrics | Neutral or positive | Decline <5% (may be noise) | Decline >5% |
+
+### When to Roll Back
+
+Roll back immediately if:
+- Error rate increases by more than 2x baseline
+- P95 latency increases by more than 50%
+- User-reported issues spike
+- Data integrity issues detected
+- Security vulnerability discovered
+
+## Monitoring and Observability
+
+### What to Monitor
+
+```
+Application metrics:
+├── Error rate (total and by endpoint)
+├── Response time (p50, p95, p99)
+├── Request volume
+├── Active users
+└── Key business metrics (conversion, engagement)
+
+Infrastructure metrics:
+├── CPU and memory utilization
+├── Database connection pool usage
+├── Disk space
+├── Network latency
+└── Queue depth (if applicable)
+
+Client metrics:
+├── Core Web Vitals (LCP, INP, CLS)
+├── JavaScript errors
+├── API error rates from client perspective
+└── Page load time
+```
+
+### Error Reporting
+
+```typescript
+// Set up error boundary with reporting
+class ErrorBoundary extends React.Component {
+  componentDidCatch(error: Error, info: React.ErrorInfo) {
+    // Report to error tracking service
+    reportError(error, {
+      componentStack: info.componentStack,
+      userId: getCurrentUser()?.id,
+      page: window.location.pathname,
+    });
+  }
+
+  render() {
+    if (this.state.hasError) {
+      return <ErrorFallback onRetry={() => this.setState({ hasError: false })} />;
+    }
+    return this.props.children;
+  }
+}
+
+// Server-side error reporting
+app.use((err: Error, req: Request, res: Response, next: NextFunction) => {
+  reportError(err, {
+    method: req.method,
+    url: req.url,
+    userId: req.user?.id,
+  });
+
+  // Don't expose internals to users
+  res.status(500).json({
+    error: { code: 'INTERNAL_ERROR', message: 'Something went wrong' },
+  });
+});
+```
+
+### Post-Launch Verification
+
+In the first hour after launch:
+
+```
+1. Check health endpoint returns 200
+2. Check error monitoring dashboard (no new error types)
+3. Check latency dashboard (no regression)
+4. Test the critical user flow manually
+5. Verify logs are flowing and readable
+6. Confirm rollback mechanism works (dry run if possible)
+```
+
+## Rollback Strategy
+
+Every deployment needs a rollback plan before it happens:
+
+```markdown
+## Rollback Plan for [Feature/Release]
+
+### Trigger Conditions
+- Error rate > 2x baseline
+- P95 latency > [X]ms
+- User reports of [specific issue]
+
+### Rollback Steps
+1. Disable feature flag (if applicable)
+   OR
+1. Deploy previous version: `git revert <commit> && git push`
+2. Verify rollback: health check, error monitoring
+3. Communicate: notify team of rollback
+
+### Database Considerations
+- Migration [X] has a rollback: `npx prisma migrate rollback`
+- Data inserted by new feature: [preserved / cleaned up]
+
+### Time to Rollback
+- Feature flag: < 1 minute
+- Redeploy previous version: < 5 minutes
+- Database rollback: < 15 minutes
+```
+## See Also
+
+- For security pre-launch checks, see `references/security-checklist.md`
+- For performance pre-launch checklist, see `references/performance-checklist.md`
+- For accessibility verification before launch, see `references/accessibility-checklist.md`
+
+## Common Rationalizations
+
+| Rationalization | Reality |
+|---|---|
+| "It works in staging, it'll work in production" | Production has different data, traffic patterns, and edge cases. Monitor after deploy. |
+| "We don't need feature flags for this" | Every feature benefits from a kill switch. Even "simple" changes can break things. |
+| "Monitoring is overhead" | Not having monitoring means you discover problems from user complaints instead of dashboards. |
+| "We'll add monitoring later" | Add it before launch. You can't debug what you can't see. |
+| "Rolling back is admitting failure" | Rolling back is responsible engineering. Shipping a broken feature is the failure. |
+
+## Red Flags
+
+- Deploying without a rollback plan
+- No monitoring or error reporting in production
+- Big-bang releases (everything at once, no staging)
+- Feature flags with no expiration or owner
+- No one monitoring the deploy for the first hour
+- Production environment configuration done by memory, not code
+- "It's Friday afternoon, let's ship it"
+
+## Verification
+
+Before deploying:
+
+- [ ] Pre-launch checklist completed (all sections green)
+- [ ] Feature flag configured (if applicable)
+- [ ] Rollback plan documented
+- [ ] Monitoring dashboards set up
+- [ ] Team notified of deployment
+
+After deploying:
+
+- [ ] Health check returns 200
+- [ ] Error rate is normal
+- [ ] Latency is normal
+- [ ] Critical user flow works
+- [ ] Logs are flowing
+- [ ] Rollback tested or verified ready