Initial commit

2026-05-16 13:43:29 +07:00
commit 688a86b611
208 changed files with 44350 additions and 0 deletions
--- a/.claude/skills/idea-refine/SKILL.md
+++ b/.claude/skills/idea-refine/SKILL.md
@@ -0,0 +1,178 @@
+---
+name: idea-refine
+description: Refines raw ideas into sharp, actionable concepts through structured divergent and convergent thinking. Use when an idea is still vague, when you need to stress-test assumptions before committing to a plan, or when you want to expand options before converging on one. Triggers on "ideate", "refine this idea", or "stress-test my plan".
+---
+
+# Idea Refine
+
+Refines raw ideas into sharp, actionable concepts worth building through structured divergent and convergent thinking.
+
+## How It Works
+
+1.  **Understand & Expand (Divergent):** Restate the idea, ask sharpening questions, and generate variations.
+2.  **Evaluate & Converge:** Cluster ideas, stress-test them, and surface hidden assumptions.
+3.  **Sharpen & Ship:** Produce a concrete markdown one-pager moving work forward.
+
+## Usage
+
+This skill is primarily an interactive dialogue. Invoke it with an idea, and the agent will guide you through the process.
+
+```bash
+# Optional: Initialize the ideas directory
+bash /mnt/skills/user/idea-refine/scripts/idea-refine.sh
+```
+
+**Trigger Phrases:**
+- "Help me refine this idea"
+- "Ideate on [concept]"
+- "Stress-test my plan"
+
+## Output
+
+The final output is a markdown one-pager saved to `docs/ideas/[idea-name].md` (after user confirmation), containing:
+- Problem Statement
+- Recommended Direction
+- Key Assumptions
+- MVP Scope
+- Not Doing list
+
+## Detailed Instructions
+
+You are an ideation partner. Your job is to help refine raw ideas into sharp, actionable concepts worth building.
+
+### Philosophy
+
+- Simplicity is the ultimate sophistication. Push toward the simplest version that still solves the real problem.
+- Start with the user experience, work backwards to technology.
+- Say no to 1,000 things. Focus beats breadth.
+- Challenge every assumption. "How it's usually done" is not a reason.
+- Show people the future — don't just give them better horses.
+- The parts you can't see should be as beautiful as the parts you can.
+
+### Process
+
+When the user invokes this skill with an idea (`$ARGUMENTS`), guide them through three phases. Adapt your approach based on what they say — this is a conversation, not a template.
+
+#### Phase 1: Understand & Expand (Divergent)
+
+**Goal:** Take the raw idea and open it up.
+
+1. **Restate the idea** as a crisp "How Might We" problem statement. This forces clarity on what's actually being solved.
+
+2. **Ask 3-5 sharpening questions** — no more. Focus on:
+   - Who is this for, specifically?
+   - What does success look like?
+   - What are the real constraints (time, tech, resources)?
+   - What's been tried before?
+   - Why now?
+
+   Use the `AskUserQuestion` tool to gather this input. Do NOT proceed until you understand who this is for and what success looks like.
+
+3. **Generate 5-8 idea variations** using these lenses:
+   - **Inversion:** "What if we did the opposite?"
+   - **Constraint removal:** "What if budget/time/tech weren't factors?"
+   - **Audience shift:** "What if this were for [different user]?"
+   - **Combination:** "What if we merged this with [adjacent idea]?"
+   - **Simplification:** "What's the version that's 10x simpler?"
+   - **10x version:** "What would this look like at massive scale?"
+   - **Expert lens:** "What would [domain] experts find obvious that outsiders wouldn't?"
+
+   Push beyond what the user initially asked for. Create products people don't know they need yet.
+
+**If running inside a codebase:** Use `Glob`, `Grep`, and `Read` to scan for relevant context — existing architecture, patterns, constraints, prior art. Ground your variations in what actually exists. Reference specific files and patterns when relevant.
+
+Read `frameworks.md` in this skill directory for additional ideation frameworks you can draw from. Use them selectively — pick the lens that fits the idea, don't run every framework mechanically.
+
+#### Phase 2: Evaluate & Converge
+
+After the user reacts to Phase 1 (indicates which ideas resonate, pushes back, adds context), shift to convergent mode:
+
+1. **Cluster** the ideas that resonated into 2-3 distinct directions. Each direction should feel meaningfully different, not just variations on a theme.
+
+2. **Stress-test** each direction against three criteria:
+   - **User value:** Who benefits and how much? Is this a painkiller or a vitamin?
+   - **Feasibility:** What's the technical and resource cost? What's the hardest part?
+   - **Differentiation:** What makes this genuinely different? Would someone switch from their current solution?
+
+   Read `refinement-criteria.md` in this skill directory for the full evaluation rubric.
+
+3. **Surface hidden assumptions.** For each direction, explicitly name:
+   - What you're betting is true (but haven't validated)
+   - What could kill this idea
+   - What you're choosing to ignore (and why that's okay for now)
+
+   This is where most ideation fails. Don't skip it.
+
+**Be honest, not supportive.** If an idea is weak, say so with kindness. A good ideation partner is not a yes-machine. Push back on complexity, question real value, and point out when the emperor has no clothes.
+
+#### Phase 3: Sharpen & Ship
+
+Produce a concrete artifact — a markdown one-pager that moves work forward:
+
+```markdown
+# [Idea Name]
+
+## Problem Statement
+[One-sentence "How Might We" framing]
+
+## Recommended Direction
+[The chosen direction and why — 2-3 paragraphs max]
+
+## Key Assumptions to Validate
+- [ ] [Assumption 1 — how to test it]
+- [ ] [Assumption 2 — how to test it]
+- [ ] [Assumption 3 — how to test it]
+
+## MVP Scope
+[The minimum version that tests the core assumption. What's in, what's out.]
+
+## Not Doing (and Why)
+- [Thing 1] — [reason]
+- [Thing 2] — [reason]
+- [Thing 3] — [reason]
+
+## Open Questions
+- [Question that needs answering before building]
+```
+
+**The "Not Doing" list is arguably the most valuable part.** Focus is about saying no to good ideas. Make the trade-offs explicit.
+
+Ask the user if they'd like to save this to `docs/ideas/[idea-name].md` (or a location of their choosing). Only save if they confirm.
+
+### Anti-patterns to Avoid
+
+- **Don't generate 20+ ideas.** Quality over quantity. 5-8 well-considered variations beat 20 shallow ones.
+- **Don't be a yes-machine.** Push back on weak ideas with specificity and kindness.
+- **Don't skip "who is this for."** Every good idea starts with a person and their problem.
+- **Don't produce a plan without surfacing assumptions.** Untested assumptions are the #1 killer of good ideas.
+- **Don't over-engineer the process.** Three phases, each doing one thing well. Resist adding steps.
+- **Don't just list ideas — tell a story.** Each variation should have a reason it exists, not just be a bullet point.
+- **Don't ignore the codebase.** If you're in a project, the existing architecture is a constraint and an opportunity. Use it.
+
+### Tone
+
+Direct, thoughtful, slightly provocative. You're a sharp thinking partner, not a facilitator reading from a script. Channel the energy of "that's interesting, but what if..." -- always pushing one step further without being exhausting.
+
+Read `examples.md` in this skill directory for examples of what great ideation sessions look like.
+
+## Red Flags
+
+- Generating 20+ shallow variations instead of 5-8 considered ones
+- Skipping the "who is this for" question
+- No assumptions surfaced before committing to a direction
+- Yes-machining weak ideas instead of pushing back with specificity
+- Producing a plan without a "Not Doing" list
+- Ignoring existing codebase constraints when ideating inside a project
+- Jumping straight to Phase 3 output without running Phases 1 and 2
+
+## Verification
+
+After completing an ideation session:
+
+- [ ] A clear "How Might We" problem statement exists
+- [ ] The target user and success criteria are defined
+- [ ] Multiple directions were explored, not just the first idea
+- [ ] Hidden assumptions are explicitly listed with validation strategies
+- [ ] A "Not Doing" list makes trade-offs explicit
+- [ ] The output is a concrete artifact (markdown one-pager), not just conversation
+- [ ] The user confirmed the final direction before any implementation work
--- a/.claude/skills/idea-refine/examples.md
+++ b/.claude/skills/idea-refine/examples.md
@@ -0,0 +1,238 @@
+# Ideation Session Examples
+
+These examples demonstrate what good ideation sessions look like across different kinds of ideas. Study the rhythm, tone, and structure — not just the content. The skill should feel equally at home with a vague startup concept, a feature for an existing product, or a process improvement.
+
+---
+
+## Example 1: Vague Early-Stage Concept (Full 3-Phase Session)
+
+**User invokes:** `/ideate I want to help small local restaurants compete with the big delivery platforms`
+
+### Phase 1: Understand & Expand
+
+**Skill response:**
+
+> **How might we** give independent restaurants the reach and convenience customers expect from delivery platforms, without forcing them into a model that erodes their margins and brand?
+
+That's a big space. Let me sharpen it:
+
+1. **Which part of "compete"?** Customer acquisition? Delivery logistics? Online ordering UX? Loyalty/retention? Margin pressure from platform fees?
+2. **Who are you building for?** The restaurant owner? The diner? Both?
+3. **What's the constraint?** Are you thinking software? A cooperative model? Something physical?
+
+**User responds:** "Restaurant owners. The core pain is that they're paying 25-30% commission to delivery apps but feel they can't opt out because that's where the customers are. I'm thinking software but I'm open."
+
+**Skill continues:**
+
+So the problem isn't delivery logistics — it's **customer dependency**. The platforms own the customer relationship, and restaurants are renting access to their own customers at 30% of revenue. That's the leverage to break.
+
+Here are 6 variations:
+
+1. **Direct Channel Toolkit:** Give restaurants their own branded ordering — web, SMS, maybe a lightweight app — so they can move existing customers off-platform. The core bet: customers who *already know* a restaurant will order direct if it's easy enough. You don't need to compete with DoorDash for discovery, just for reordering.
+
+2. **Inversion — Charge the Customer, Not the Restaurant:** What if the delivery fee model flipped? Customers pay a small premium for direct ordering, restaurants pay zero commission. Position it as "support your local spot." The bet: enough customers care about supporting local businesses to pay $1-2 more per order.
+
+3. **Constraint Removal — What If Delivery Weren't Required?** The platforms bundle discovery + ordering + delivery, but most of their cost (and the restaurant's commission) is delivery. What if you only solved ordering and pickup? Pickup-first model. Restaurants keep 100% of revenue, customers save the delivery fee. Works best in dense neighborhoods.
+
+4. **Combination — Restaurant Collective:** A co-op model where 20-50 local restaurants in an area share a platform, split costs, and cross-promote each other. Individual restaurants can't afford marketing — but a neighborhood collective can. Think "local food hall" but digital.
+
+5. **Audience Shift — For Regular Customers:** Don't try to acquire new customers. Build exclusively for a restaurant's existing regulars — the people who come in weekly. A loyalty/reorder tool that's dead simple: "Your usual?" one-tap reorder. The 80/20 bet: 20% of customers drive 80% of revenue, and those people don't need DoorDash to discover the restaurant.
+
+6. **Expert Lens — What Restaurant Owners Actually Do:** Most small restaurant owners aren't looking at dashboards. They're in the kitchen. Any tool that requires them to "manage" something will fail. The winning approach might be something that runs itself — auto-texts regulars when they haven't ordered in 2 weeks, auto-posts the daily special, auto-handles reorders. Zero restaurant-side effort.
+
+Variation 5 and 6 together are interesting. Instead of trying to out-platform the platforms, you'd build something hyper-narrow: a reordering and retention tool for a restaurant's best existing customers. No discovery, no delivery, no marketplace. Just "keep your regulars ordering direct."
+
+Which directions interest you?
+
+---
+
+### Phase 2: Evaluate & Converge
+
+**User responds:** "I love the regulars angle (5 and 6). The direct channel toolkit (1) feels necessary but boring. The co-op idea (4) is interesting but complex. Let me focus on 1 + 5 + 6 combined."
+
+**Skill response:**
+
+Good instincts. That combination has a clear thesis: **own the relationship with your best customers, make reordering effortless, and require zero effort from the restaurant owner.** Let me stress-test it.
+
+**Direction A: The "Regulars Engine" (Variations 5 + 6 fused)**
+A tool focused entirely on a restaurant's existing repeat customers. Auto-reorder, auto-engagement, zero management.
+
+- **Core bet:** A restaurant's top 20% of customers will shift to direct ordering if the experience is frictionless and personal. The restaurant doesn't need *new* customers — it needs to stop paying 30% on customers it already has.
+- **User value:** High and immediate. Every direct reorder saves the restaurant 25-30% commission. If a regular orders $30/week, that's ~$400/year in saved fees per customer. 50 regulars = $20K/year. That's real money for a small restaurant.
+- **Feasibility:** Medium. The ordering/payment tech is commodity. The hard parts: (a) how do you identify and onboard "regulars" without the restaurant doing manual work? (b) How do you handle the cold start — the first reorder needs an order history that lives on the delivery platform, not with you.
+- **Differentiation:** High for the niche. Platforms optimize for marketplace dynamics (new customer discovery, driver logistics). Nobody is optimizing specifically for "keep your regulars ordering direct." It's too small for DoorDash to care about, which is exactly why it's a good wedge.
+- **Hidden assumptions:**
+  - Regulars will install/use yet another ordering method (assumption most likely to be wrong)
+  - Restaurant owners will share customer contact info or let you access it
+  - Direct ordering won't cannibalize dine-in revenue
+- **What could kill it:** If customers simply won't switch channels. Habit is powerful — even if DoorDash costs more, it's already on their phone.
+
+**Direction B: Direct Channel + Retention Hybrid (Variation 1 + 5)**
+Branded ordering for the restaurant *plus* the retention automation. More complete but more complex.
+
+- **Core bet:** Restaurants need both the channel (a way to accept orders) and the engine (a way to keep customers coming back through that channel).
+- **User value:** Higher ceiling, but only if both parts work together. The ordering channel alone is a commodity (Square, Toast already do this). The retention layer alone has no channel to retain customers *on*.
+- **Feasibility:** Lower. You're building two products and hoping they're better together.
+- **Differentiation:** Medium. The combination is interesting, but each piece individually has competitors.
+- **Hidden assumption:** That bundling these is better than partnering (e.g., retention layer on top of Square Online).
+- **What could kill it:** Scope. Building two things well is exponentially harder than building one thing well.
+
+**The honest read:** Direction A is the sharper bet. Direction B is safer but less differentiated. The biggest risk for A is the channel-switching problem — getting customers to order through a new method. If you can crack that (and variation 6's "zero effort" approach might help — an SMS-based reorder doesn't require a new app), the rest follows.
+
+One thing I want to push back on: you said variation 1 feels "necessary but boring." That instinct to include the "necessary" thing is how products lose focus. If the regulars engine works, you can add a branded ordering channel later. But if you build the channel first, you're just another Square Online competitor.
+
+---
+
+### Phase 3: Sharpen & Ship
+
+**Skill produces:**
+
+# ReOrder: Keep Your Regulars Ordering Direct
+
+## Problem Statement
+How might we help independent restaurants retain their best existing customers through direct ordering, eliminating the 25-30% platform commission on the revenue that matters most?
+
+## Recommended Direction
+Build a **regulars-only retention engine** — not a marketplace, not a full ordering platform. Hyper-focused on one job: make it effortless for a restaurant's repeat customers to reorder directly.
+
+The key insight is that restaurants don't need help *finding* their best customers — they know who walks in every Tuesday. They need help moving those relationships off-platform. And the tool needs to run itself, because the owner is in the kitchen, not at a dashboard.
+
+SMS-first (not app-first) is likely the right channel. A text saying "Hey, want your usual Thursday order from Marco's?" with a one-tap confirmation is lower friction than any app install.
+
+## Key Assumptions to Validate
+- [ ] Repeat customers will reorder via SMS/direct link instead of their usual delivery app — test with 5 restaurants, 20 regulars each, measure conversion over 4 weeks
+- [ ] Restaurant owners can identify their top 20-30 regulars and share contact info — test by asking 10 restaurant owners if they'd do this
+- [ ] The commission savings ($8-10 per order) is motivating enough for owners to invest initial setup effort — interview 10 owners about platform fee pain
+
+## MVP Scope
+- SMS-based reordering for a restaurant's self-identified regular customers
+- Restaurant owner adds regulars manually (name + phone + usual order) — 15-minute setup
+- Customer receives a text with their usual order, confirms with a reply, pays via link
+- Restaurant receives the order via text/simple dashboard
+- No delivery — pickup only in v1
+- No discovery, no marketplace, no app
+
+## Not Doing (and Why)
+- **Delivery logistics** — delivery is the expensive, complex part and not the core problem. Pickup-first validates demand without operational burden.
+- **Customer acquisition/discovery** — that's the platform's game. Competing on discovery means competing with DoorDash's budget. We compete on retention instead.
+- **Branded restaurant apps/websites** — commodity. Square and Toast already do this. Another branded channel doesn't solve the behavioral problem.
+- **Menu management, POS integration** — scope creep. "Your usual" doesn't need a dynamic menu. Start with static orders.
+- **Analytics dashboards** — the owner is in the kitchen. Build reporting that's passive (weekly text summary) not active (log in and look at charts).
+
+## Open Questions
+- What's the legal/compliance landscape for SMS marketing to restaurant customers?
+- How do you handle menu/price changes for "usual" orders?
+- Is there a way to bootstrap the regular customer list from existing POS data instead of manual entry?
+- What's the pricing model? Per-order fee? Monthly subscription?
+
+---
+
+## Example 2: Feature Idea Within an Existing Product (Codebase-Aware)
+
+**User invokes:** `/ideate adding real-time collaboration to our document editor`
+
+### Phase 1: Understand & Expand
+
+**Skill response:**
+
+> **How might we** let multiple people work on the same document simultaneously in a way that feels natural, not chaotic?
+
+I scanned your codebase and see you're using a block-based editor architecture with a REST API for persistence. No WebSocket infrastructure currently. That's important context. A few questions:
+
+1. **What's the collaboration pattern?** Are people editing the same paragraph at the same time (Google Docs style)? Or working in different sections (Notion style)? Or reviewing/commenting while one person edits?
+2. **How many simultaneous users?** 2-3 (pair editing)? 10+ (team workspace)?
+3. **What's driving this now?** Customer requests? Competitive pressure? Internal need?
+
+**User responds:** "2-5 people, usually working in different sections. Customers keep asking. We're losing deals to Notion because they assume any modern editor has this."
+
+**Skill continues:**
+
+The competitive pressure framing matters — this might be a checkbox feature (must-have to stay in the consideration set) rather than a differentiator. That changes how much you should invest.
+
+I see your document model uses independent blocks with a flat ordering system (`src/models/document.ts:45`). That's actually favorable — block-level granularity means conflicts are rarer than character-level editing.
+
+Here are 5 variations:
+
+1. **Block-Level Locking (Simplest Path):** When someone starts editing a block, it's locked for others. They see a presence indicator ("Alex is editing...") but can edit any other block freely. No conflict resolution needed. Your existing block model supports this with minimal changes — add a `locked_by` field and a WebSocket layer for presence.
+
+2. **Inversion — Async-First Collaboration:** Instead of real-time co-editing, lean into async. Inline suggestions, comments, and proposed edits that the document owner merges. Like suggestion mode, but as the *primary* collaboration model. Bet: for 2-5 people in different sections, async might actually be *better* than real-time.
+
+3. **Simplification — Presence Without Co-Editing:** Show who's viewing the document and what section they're looking at (cursors, avatars in margins), but don't enable simultaneous editing. People coordinate verbally. Sounds primitive, but might be 80% of the value at 10% of the engineering cost.
+
+4. **Combination — Collaboration + Version Branches:** Let people create "branches" of a document, edit independently, then merge. Each person gets their own space, no conflicts during editing, and you build a merge UI. Plays to your existing versioning system.
+
+5. **10x Version — AI-Mediated Collaboration:** Multiple people describe what they want in a section, and AI synthesizes their inputs into a draft they all iterate on. Nobody directly edits the same text — they all influence it. Radical, possibly too early, but worth naming.
+
+Given your architecture and the competitive context, I'd push you toward variation 1 or 3. You need to close deals, which means you need this shipped, which means you need the simplest version that checks the box. Variation 1 gives you real co-editing. Variation 3 is even cheaper but might not satisfy the "we need real-time" checkbox.
+
+---
+
+*Phase 2 would cluster into "minimum viable co-editing (1 or 3)" vs. "differentiated collaboration model (2 or 4)" — evaluating whether to check the competitive box quickly or invest in something that becomes a selling point.*
+
+*Phase 3 would produce a one-pager recommending block-level locking as the MVP, with a "Not Doing" list that includes character-level CRDT (too complex), offline sync (not needed for 2-5 users), and AI features (premature).*
+
+---
+
+## Example 3: Process/Workflow Idea (Non-Product)
+
+**User invokes:** `/ideate a better way to run our weekly team retrospectives — they've gotten stale and people dread them`
+
+### Phase 1: Understand & Expand
+
+**Skill response:**
+
+> **How might we** make retrospectives genuinely useful — something the team looks forward to — instead of a recurring meeting people endure?
+
+Before expanding:
+
+1. **What's stale about them?** Same format every time? Same people talking? Action items that never get done? Topics feel repetitive?
+2. **Team size and dynamic?** 5 people who all speak up, or 15 where 3 dominate?
+3. **What does a *good* retro look like to you?** Have you had one that worked? What made it different?
+
+**User responds:** "8 people, maybe 3-4 actually talk. Same 'what went well / what didn't / action items' format for a year. The action items pile up and nothing changes. People think it's theater."
+
+**Skill continues:**
+
+The action item problem is the real signal. If nothing changes after the retro, people rationally disengage. The meeting isn't stale — it's **broken at the output layer**. The format might be fine if the follow-through worked.
+
+Here are 6 variations:
+
+1. **Fix the Output, Not the Meeting:** Keep the retro format but completely change what happens after. One action item max per retro. Assigned an owner and a deadline during the meeting. Next retro starts with a 5-minute demo of the completed action. If it's not done, that *becomes* the retro topic. The constraint of one item forces prioritization.
+
+2. **Inversion — Kill the Meeting:** Replace the synchronous retro with an async one. Everyone submits reflections in writing during the week. A rotating facilitator synthesizes themes and proposes one change. The "meeting" is a 15-minute standup to agree or disagree. Bet: writing produces more honest feedback than speaking in a group.
+
+3. **Audience Shift — Retro for the Quiet Ones:** Design the format for the 4-5 people who don't talk. Anonymous submission of topics. Dot-voting to prioritize. Small-group breakouts (pairs) before full-group discussion. The loud voices still participate, but the structure prevents them from dominating.
+
+4. **Simplification — The One Question Retro:** Each retro has exactly one question, and it changes every week. "What's one thing that slowed you down?" "What did you learn that the team should know?" "If you could change one thing about how we work, what?" The constraint forces depth over breadth.
+
+5. **Combination — Retro + Experimentation:** Treat each retro output as a hypothesis. "We believe that [change] will improve [outcome]." Run it as a 2-week experiment. Next retro: did it work? Keep, modify, or kill. Turns the retro into a continuous improvement engine with built-in accountability.
+
+6. **Expert Lens — What Facilitators Know:** Experienced facilitators say the #1 retro killer is lack of safety, not format. People won't say what's really wrong if they fear consequences. The fix might not be structural — it might be starting with an anonymous "team health check" score (1-5) each week. When safety is high, retros naturally improve.
+
+The interesting tension: variations 1 and 5 fix the *output* problem (nothing changes). Variations 2, 3, and 4 fix the *input* problem (same voices, same topics). Variation 6 says both are symptoms of a deeper issue. Where do you think the real bottleneck is?
+
+---
+
+*Phase 2 would evaluate in terms of: effort to try (most are free — just change how you run the next meeting), risk (variation 2 is the biggest departure), and whether the team's real problem is output (action items die) or input (not enough honesty).*
+
+*Phase 3 would produce a one-pager recommending starting with variation 1 (one action item, demo next week) as a zero-cost experiment, combined with variation 3's anonymous submission. "Not Doing" list: new tools, elaborate facilitation techniques, or anything requiring budget. The first fix should take 0 minutes of prep and $0.*
+
+---
+
+## What to Notice in These Examples
+
+1. **The restatement changes the frame.** "Help restaurants compete" becomes "retain existing customers." "Add real-time collaboration" becomes "let people work simultaneously without chaos." "Fix stale retros" becomes "fix the output layer."
+
+2. **Questions diagnose before prescribing.** Each question determines which *type* of problem this actually is. The retro example reveals the problem is action item follow-through, not meeting format — and that changes every variation.
+
+3. **Variations have reasons.** Each one explains *why* it exists (what lens generated it), not just *what* it is. The label (Inversion, Simplification, etc.) teaches the user to think this way themselves.
+
+4. **The skill has opinions.** "I'd push you toward 1 or 3." "Variation 6 is worth sitting with." It tells you what it thinks matters and why — not just neutral options.
+
+5. **Phase 2 is honest.** Ideas get called out for low differentiation or high complexity. The skill pushes back: "That instinct to include the 'necessary' thing is how products lose focus."
+
+6. **The output is actionable.** The one-pager ends with things you can *do* (validate assumptions, build the MVP, try the experiment), not things to *think about*.
+
+7. **The "Not Doing" list does real work.** It's specific and reasoned. Each item is something you might *want* to do but shouldn't yet.
+
+8. **The skill adapts to context.** A codebase-aware example references actual architecture. A process idea generates zero-cost experiments instead of products. The framework stays the same but the output matches the domain.
--- a/.claude/skills/idea-refine/frameworks.md
+++ b/.claude/skills/idea-refine/frameworks.md
@@ -0,0 +1,99 @@
+# Ideation Frameworks Reference
+
+Use these frameworks selectively. Pick the lens that fits the idea — don't mechanically run every framework. The goal is to unlock thinking, not to follow a checklist.
+
+## SCAMPER
+
+A structured way to transform an existing idea by applying seven different operations:
+
+- **Substitute:** What component, material, or process could you swap out? What if you replaced the core technology? The target audience? The business model?
+- **Combine:** What if you merged this with another product, service, or idea? What two things that don't usually go together would create something new?
+- **Adapt:** What else is like this? What ideas from other industries, domains, or time periods could you borrow? What parallel exists in nature?
+- **Modify (Magnify/Minimize):** What if you made it 10x bigger? 10x smaller? What if you exaggerated one feature? What if you stripped it to the absolute minimum?
+- **Put to other uses:** Who else could use this? What other problems could it solve? What happens if you use it in a completely different context?
+- **Eliminate:** What happens if you remove a feature entirely? What's the version with zero configuration? What would it look like with half the steps?
+- **Reverse/Rearrange:** What if you did the steps in the opposite order? What if the user did the work instead of the system (or vice versa)? What if you reversed the value chain?
+
+**Best for:** Improving or reimagining existing products/features. Less useful for greenfield ideas.
+
+## How Might We (HMW)
+
+Reframe problems as opportunities using the "How Might We..." format:
+
+- Start with an observation or pain point
+- Reframe it as "How might we [desired outcome] for [specific user] without [key constraint]?"
+- Generate multiple HMW framings of the same problem — different framings unlock different solutions
+
+**Good HMW qualities:**
+- Narrow enough to be actionable ("...help new users find relevant content in their first 5 minutes")
+- Broad enough to allow creative solutions (not "...add a recommendation sidebar")
+- Contains a tension or constraint that forces creativity
+
+**Bad HMW qualities:**
+- Too broad: "How might we make users happy?"
+- Too narrow: "How might we add a button to the settings page?"
+- Solution-embedded: "How might we build a chatbot for support?"
+
+**Best for:** Reframing stuck thinking. When someone is anchored on a solution, pull them back to the problem.
+
+## First Principles Thinking
+
+Break the idea down to its fundamental truths, then rebuild from there:
+
+1. **What do we know is true?** (not assumed, not conventional — actually true)
+2. **What are we assuming?** List every assumption, even the ones that feel obvious
+3. **Which assumptions can we challenge?** For each, ask: "Is this actually a law of physics, or just how it's been done?"
+4. **Rebuild from the truths.** If you only had the fundamental truths, what would you build?
+
+**Best for:** Breaking out of incremental thinking. When every idea feels like a small improvement on the status quo.
+
+## Jobs to Be Done (JTBD)
+
+Focus on what the user is trying to accomplish, not what they say they want:
+
+- **Functional job:** What task are they trying to complete?
+- **Emotional job:** How do they want to feel?
+- **Social job:** How do they want to be perceived?
+
+Format: "When I [situation], I want to [motivation], so I can [expected outcome]."
+
+**Key insight:** People don't buy products — they hire them to do a job. The competing product isn't always in the same category. (Netflix competes with sleep, not just other streaming services.)
+
+**Best for:** Understanding the real problem. When you're not sure if you're solving the right thing.
+
+## Constraint-Based Ideation
+
+Deliberately impose constraints to force creative solutions:
+
+- **Time constraint:** "What if you only had 1 day to build this?"
+- **Feature constraint:** "What if it could only have one feature?"
+- **Tech constraint:** "What if you couldn't use [the obvious technology]?"
+- **Cost constraint:** "What if it had to be free forever?"
+- **Audience constraint:** "What if your user had never used a computer before?"
+- **Scale constraint:** "What if it needed to work for 1 billion users? What about just 10?"
+
+**Best for:** Cutting through complexity. When the idea is growing too large or too vague.
+
+## Pre-mortem
+
+Imagine the idea has already failed. Work backwards:
+
+1. It's 12 months from now. The project shipped and flopped. What went wrong?
+2. List every plausible reason for failure — technical, market, team, timing
+3. For each failure mode: Is this preventable? Is this a signal the idea needs to change?
+4. Which failure modes are you willing to accept? Which ones would kill the project?
+
+**Best for:** Phase 2 evaluation. Stress-testing ideas that feel good but haven't been pressure-tested.
+
+## Analogous Inspiration
+
+Look at how other domains solved similar problems:
+
+- What industry has already solved a version of this problem?
+- What would this look like if [specific company/product] built it?
+- What natural system works this way?
+- What historical precedent exists?
+
+The key is finding *structural* similarities, not surface-level ones. "Uber for X" is surface-level. "A two-sided marketplace that solves a trust problem between strangers" is structural.
+
+**Best for:** Phase 1 expansion. Generating variations that feel genuinely different from the obvious approach.
--- a/.claude/skills/idea-refine/references/accessibility-checklist.md
+++ b/.claude/skills/idea-refine/references/accessibility-checklist.md
@@ -0,0 +1,160 @@
+# Accessibility Checklist
+
+Quick reference for WCAG 2.1 AA compliance. Use alongside the `frontend-ui-engineering` skill.
+
+## Table of Contents
+
+- [Essential Checks](#essential-checks)
+- [Common HTML Patterns](#common-html-patterns)
+- [Testing Tools](#testing-tools)
+- [Quick Reference: ARIA Live Regions](#quick-reference-aria-live-regions)
+- [Common Anti-Patterns](#common-anti-patterns)
+
+## Essential Checks
+
+### Keyboard Navigation
+- [ ] All interactive elements focusable via Tab key
+- [ ] Focus order follows visual/logical order
+- [ ] Focus is visible (outline/ring on focused elements)
+- [ ] Custom widgets have keyboard support (Enter to activate, Escape to close)
+- [ ] No keyboard traps (user can always Tab away from a component)
+- [ ] Skip-to-content link at top of page - visible (at least) on keyboard focus
+- [ ] Modals trap focus while open, return focus on close
+
+### Screen Readers
+- [ ] All images have `alt` text (or `alt=""` for decorative images)
+- [ ] All form inputs have associated labels (`<label>` or `aria-label`)
+- [ ] Buttons and links have descriptive text (not "Click here")
+- [ ] Icon-only buttons have `aria-label`
+- [ ] Page has one `<h1>` and headings don't skip levels
+- [ ] Dynamic content changes announced (`aria-live` regions)
+- [ ] Tables have `<th>` headers with scope
+
+### Visual
+- [ ] Text contrast ≥ 4.5:1 (normal text) or ≥ 3:1 (large text, 18px+)
+- [ ] UI components contrast ≥ 3:1 against background
+- [ ] Color is not the only way to convey information
+- [ ] Text resizable to 200% without breaking layout
+- [ ] No content that flashes more than 3 times per second
+
+### Forms
+- [ ] Every input has a visible label
+- [ ] Required fields indicated (not by color alone)
+- [ ] Error messages specific and associated with the field
+- [ ] Error state visible by more than color (icon, text, border)
+- [ ] Form submission errors summarized and focusable
+- [ ] Known fields use autocomplete (for example `type="email" autocomplete="email"`)
+
+### Content
+- [ ] Language declared (`<html lang="en">`)
+- [ ] Page has a descriptive `<title>`
+- [ ] Links distinguish from surrounding text (not by color alone)
+- [ ] Touch targets ≥ 44x44px on mobile
+- [ ] Meaningful empty states (not blank screens)
+
+## Common HTML Patterns
+
+### Buttons vs. Links
+
+```html
+<!-- Use <button> for actions -->
+<button onClick={handleDelete}>Delete Task</button>
+
+<!-- Use <a> for navigation -->
+<a href="/tasks/123">View Task</a>
+
+<!-- NEVER use div/span as buttons -->
+<div onClick={handleDelete}>Delete</div>  <!-- BAD -->
+```
+
+### Form Labels
+
+```html
+<!-- Explicit label association -->
+<label htmlFor="email">Email address</label>
+<input id="email" type="email" required />
+
+<!-- Implicit wrapping -->
+<label>
+  Email address
+  <input type="email" required />
+</label>
+
+<!-- Hidden label (visible label preferred) -->
+<input type="search" aria-label="Search tasks" />
+```
+
+### ARIA Roles
+
+```html
+<!-- Navigation -->
+<nav aria-label="Main navigation">...</nav>
+<nav aria-label="Footer links">...</nav>
+
+<!-- Status messages -->
+<div role="status" aria-live="polite">Task saved</div>
+
+<!-- Alert messages -->
+<div role="alert">Error: Title is required</div>
+
+<!-- Modal dialogs -->
+<dialog aria-modal="true" aria-labelledby="dialog-title">
+  <h2 id="dialog-title">Confirm Delete</h2>
+  ...
+</dialog>
+
+<!-- Loading states -->
+<div aria-busy="true" aria-label="Loading tasks">
+  <Spinner />
+</div>
+```
+
+### Accessible Lists
+
+```html
+<ul role="list" aria-label="Tasks">
+  <li>
+    <input type="checkbox" id="task-1" aria-label="Complete: Buy groceries" />
+    <label htmlFor="task-1">Buy groceries</label>
+  </li>
+</ul>
+```
+
+## Testing Tools
+
+```bash
+# Automated audit
+npx axe-core          # Programmatic accessibility testing
+npx pa11y             # CLI accessibility checker
+
+# In browser
+# Chrome DevTools → Lighthouse → Accessibility
+# Chrome DevTools → Elements → Accessibility tree
+
+# Screen reader testing
+# macOS: VoiceOver (Cmd + F5)
+# Windows: NVDA (free) or JAWS
+# Linux: Orca
+```
+
+## Quick Reference: ARIA Live Regions
+
+| Value | Behavior | Use For |
+|-------|----------|---------|
+| `aria-live="polite"` | Announced at next pause | Status updates, saved confirmations |
+| `aria-live="assertive"` | Announced immediately | Errors, time-sensitive alerts |
+| `role="status"` | Same as `polite` | Status messages |
+| `role="alert"` | Same as `assertive` | Error messages |
+
+## Common Anti-Patterns
+
+| Anti-Pattern | Problem | Fix |
+|---|---|---|
+| `div` as button | Not focusable, no keyboard support | Use `<button>` |
+| Missing `alt` text | Images invisible to screen readers | Add descriptive `alt` |
+| Color-only states | Invisible to color-blind users | Add icons, text, or patterns |
+| Autoplaying media | Disorienting, can't be stopped | Add controls, don't autoplay |
+| Custom dropdown with no ARIA | Unusable by keyboard/screen reader | Use native `<select>` or proper ARIA listbox |
+| Removing focus outlines | Users can't see where they are | Style outlines, don't remove them |
+| Empty links/buttons | "Link" announced with no description | Add text or `aria-label` |
+| `tabindex > 0` | Breaks natural tab order | Use `tabindex="0"` or `-1` only |
--- a/.claude/skills/idea-refine/references/orchestration-patterns.md
+++ b/.claude/skills/idea-refine/references/orchestration-patterns.md
@@ -0,0 +1,370 @@
+# Orchestration Patterns
+
+Reference catalog of agent orchestration patterns this repo endorses, plus anti-patterns to avoid. Read this before adding a new slash command that coordinates multiple personas, or before introducing a new persona that "wraps" existing ones.
+
+The governing rule: **the user (or a slash command) is the orchestrator. Personas do not invoke other personas.** Skills are mandatory hops inside a persona's workflow.
+
+---
+
+## Endorsed patterns
+
+### 1. Direct invocation (no orchestration)
+
+Single persona, single perspective, single artifact. The default and the cheapest option.
+
+```
+user → code-reviewer → report → user
+```
+
+**Use when:** the work is one perspective on one artifact and you can describe it in one sentence.
+
+**Examples:**
+- "Review this PR" → `code-reviewer`
+- "Find security issues in `auth.ts`" → `security-auditor`
+- "What tests are missing for the checkout flow?" → `test-engineer`
+
+**Cost:** one round trip. The baseline you should always compare orchestrated patterns against.
+
+---
+
+### 2. Single-persona slash command
+
+A slash command that wraps one persona with the project's skills. Saves the user from re-explaining the workflow every time.
+
+```
+/review → code-reviewer (with code-review-and-quality skill) → report
+```
+
+**Use when:** the same single-persona invocation happens repeatedly with the same setup.
+
+**Examples in this repo:** `/review`, `/test`, `/code-simplify`.
+
+**Cost:** same as direct invocation. The slash command is just a saved prompt.
+
+**Anti-signal:** if the slash command's body is mostly "decide which persona to call," delete it and let the user call the persona directly.
+
+---
+
+### 3. Parallel fan-out with merge
+
+Multiple personas operate on the same input concurrently, each producing an independent report. A merge step (in the main agent's context) synthesizes them into a single decision.
+
+```
+                    ┌─→ code-reviewer    ─┐
+/ship → fan out  ───┼─→ security-auditor ─┤→ merge → go/no-go + rollback
+                    └─→ test-engineer    ─┘
+```
+
+**Use when:**
+- The sub-tasks are genuinely independent (no shared mutable state, no ordering dependency)
+- Each sub-agent benefits from its own context window
+- The merge step is small enough to stay in the main context
+- Wall-clock latency matters
+
+**Examples in this repo:** `/ship`.
+
+**Cost:** N parallel sub-agent contexts + one merge turn. Higher than direct invocation, but faster wall-clock and produces better reports because each sub-agent stays focused on its single perspective.
+
+**Validation checklist before adopting this pattern:**
+- [ ] Can I run all sub-agents at the same time without ordering issues?
+- [ ] Does each persona produce a different *kind* of finding, not just the same finding from a different angle?
+- [ ] Will the merge step fit in the main agent's remaining context?
+- [ ] Is the user's wait time long enough that parallelism is actually noticeable?
+
+If any answer is "no," fall back to direct invocation or a single-persona command.
+
+---
+
+### 4. Sequential pipeline as user-driven slash commands
+
+The user runs slash commands in a defined order, carrying context (or commit history) between them. There is no orchestrator agent — the user IS the orchestrator.
+
+```
+user runs:  /spec  →  /plan  →  /build  →  /test  →  /review  →  /ship
+```
+
+**Use when:** the workflow has dependencies (each step needs the previous step's output) and human judgment between steps adds value.
+
+**Examples in this repo:** the entire DEFINE → PLAN → BUILD → VERIFY → REVIEW → SHIP lifecycle.
+
+**Cost:** one sub-agent context per step. Free for the orchestration layer because there is no orchestrator agent.
+
+**Why not automate it:** an LLM "lifecycle orchestrator" would (a) lose nuance between steps because it has to summarize for hand-off, (b) skip the human checkpoints that catch wrong-direction work early, and (c) double the token cost via paraphrasing turns.
+
+---
+
+### 5. Research isolation (context preservation)
+
+When a task requires reading large amounts of material that shouldn't pollute the main context, spawn a research sub-agent that returns only a digest.
+
+```
+main agent → research sub-agent (reads 50 files) → digest → main agent continues
+```
+
+**Use when:**
+- The main session needs to stay focused on a downstream task
+- The investigation result is much smaller than the input it consumes
+- The decision quality benefits from the main agent having room to think after
+
+**Examples:** "Find every call site of this deprecated API across the monorepo," "Summarize what these 30 ADRs say about caching."
+
+**Cost:** one isolated sub-agent context. Worth it any time the alternative is loading hundreds of files into the main context.
+
+**On Claude Code, use the built-in `Explore` subagent** rather than defining a custom research persona. `Explore` runs on Haiku, is denied write/edit tools, and is purpose-built for this pattern. Define a custom research subagent only when `Explore` doesn't fit (e.g. you need a domain-specific system prompt the model wouldn't infer).
+
+---
+
+## Claude Code compatibility
+
+This catalog is harness-agnostic, but most readers will run it on Claude Code. Here's how each pattern maps onto Claude Code's primitives — and where the platform enforces our rules for us.
+
+### Where personas live
+
+Plugin subagents go in `agents/` at the plugin root. This repo is a plugin (`.claude-plugin/plugin.json`), so `agents/code-reviewer.md`, `agents/security-auditor.md`, and `agents/test-engineer.md` are auto-discovered when the plugin is enabled. No path configuration needed.
+
+### Subagents vs. Agent Teams
+
+Claude Code has two parallelism primitives. Pattern 3 (parallel fan-out with merge) maps to **subagents**. If you need teammates that talk to each other, use **Agent Teams** instead.
+
+| | Subagents | Agent Teams |
+|--|-----------|-------------|
+| Coordination | Main agent fans out, sub-agents only report back | Teammates message each other, share a task list |
+| Context | Own context window per subagent | Own context window per teammate |
+| When to use | Independent tasks producing reports | Collaborative work needing discussion |
+| Status | Stable | Experimental — requires `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` |
+| Cost | Lower | Higher — each teammate is a separate Claude instance |
+
+**The personas in this repo work in both modes.** When spawned as subagents (e.g. by `/ship`), they report findings to the main session. When spawned as teammates (`Spawn a teammate using the security-auditor agent type…`), they can challenge each other's findings directly. The persona definition is the same; only the spawning context changes.
+
+One subtlety: the `skills` and `mcpServers` frontmatter fields in a persona are honored when it runs as a subagent but **ignored when it runs as a teammate** — teammates load skills and MCP servers from your project and user settings, the same as a regular session. If a persona depends on a specific skill or MCP server being loaded, configure it at the session level so it's available in both modes.
+
+### Platform-enforced rules
+
+Two rules in this catalog aren't just convention — Claude Code enforces them:
+
+- **"Subagents cannot spawn other subagents"** (verbatim from the docs). Anti-pattern B (persona-calls-persona) and Anti-pattern D (deep persona trees) cannot exist on Claude Code by construction.
+- **"No nested teams"** — teammates cannot spawn their own teams. Same anti-patterns blocked at the team level.
+
+This means you can adopt the patterns in this catalog without worrying about contributors accidentally building the anti-patterns. They'll just fail to load.
+
+### Built-in subagents to know about
+
+Before defining a custom subagent, check whether one of these covers the role:
+
+| Built-in | Purpose |
+|----------|---------|
+| `Explore` | Read-only codebase search and analysis. Use this for Pattern 5 (research isolation). |
+| `Plan` | Read-only research during plan mode. |
+| `general-purpose` | Multi-step tasks needing both exploration and modification. |
+
+Don't redefine these. Layer your specialist personas (code-reviewer, security-auditor, test-engineer) on top of them.
+
+### Frontmatter restrictions for plugin agents
+
+Plugin subagents do **not** support the `hooks`, `mcpServers`, or `permissionMode` frontmatter fields — these are silently ignored. If a future persona needs any of those, the user must copy the file into `.claude/agents/` or `~/.claude/agents/` instead.
+
+The fields that DO work in plugin agents are: `name`, `description`, `tools`, `disallowedTools`, `model`, `maxTurns`, `skills`, `memory`, `background`, `effort`, `isolation`, `color`, `initialPrompt`. Use `model` per-persona if you want to optimize cost (e.g. Haiku for `test-engineer` coverage scans, Sonnet for `code-reviewer`, Opus for `security-auditor`).
+
+### Spawning multiple subagents in parallel
+
+In Claude Code, parallel fan-out (Pattern 3) requires issuing **multiple Agent tool calls in a single assistant turn**. Sequential turns serialize execution. `/ship` calls this out explicitly. Any new orchestrator command should do the same.
+
+---
+
+## Worked example: Agent Teams for competing-hypothesis debugging
+
+This example shows when to reach for **Agent Teams** instead of `/ship`'s subagent fan-out. The two patterns look similar from a distance — both spawn the same three personas — but the value comes from a different place.
+
+### The scenario
+
+> *Checkout occasionally hangs for ~30 seconds before completing. It happens roughly once every 50 sessions. No errors in logs. Started after last week's release.*
+
+Plausible root causes (mutually exclusive, all fit the symptoms):
+
+1. A race condition in the new payment-confirmation flow
+2. An auth check that occasionally falls through to a slow synchronous network call
+3. A missing index on a query that scales with cart size
+4. A flaky third-party API where the SDK retries silently before timing out
+
+A single agent will pick the first plausible theory and stop investigating. A `/ship`-style subagent fan-out would have each persona report independently — but their reports never meet, so nothing rules out the wrong theories.
+
+This is exactly the case the Agent Teams docs describe: *"With multiple independent investigators actively trying to disprove each other, the theory that survives is much more likely to be the actual root cause."*
+
+### Why this is *not* a `/ship` job
+
+| | `/ship` (subagents) | Agent Teams |
+|--|--------------------|-------------|
+| Sub-agents see | The same diff, different lenses | A shared task list, each other's messages |
+| Output | Three independent reports → one merge | Adversarial debate → consensus root cause |
+| Right when | You want a verdict on a known artifact | You want to *find* the artifact among hypotheses |
+
+`/ship` is a verdict; Agent Teams is an investigation.
+
+### Setup (one-time, per-environment)
+
+Agent Teams is experimental. In `~/.claude/settings.json`:
+
+```json
+{
+  "env": {
+    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
+  }
+}
+```
+
+Requires Claude Code v2.1.32 or later. The personas in this repo are picked up automatically — no team-config files to author by hand.
+
+### The trigger prompt
+
+Type into the lead session, in natural language:
+
+```
+Users report checkout hangs for ~30 seconds intermittently after last
+week's release. No errors in logs.
+
+Create an agent team to debug this with competing hypotheses. Spawn
+three teammates using the existing agent types:
+
+  - code-reviewer  — investigate race conditions and blocking calls
+                     in the checkout code path
+  - security-auditor — investigate auth checks, session handling,
+                       and any synchronous network calls added recently
+  - test-engineer  — propose tests that would distinguish between the
+                     hypotheses and check coverage gaps in checkout
+
+Have them message each other directly to challenge each other's
+theories. Update findings as consensus emerges. Only converge when
+two teammates agree they can disprove the others'.
+```
+
+The lead spawns three teammates referencing the existing persona names. The persona body is **appended** to each teammate's system prompt as additional instructions (on top of the team-coordination instructions the lead installs); the trigger prompt above becomes their task.
+
+### What happens
+
+1. Each teammate runs in its own context window, exploring the codebase from its own lens.
+2. Teammates use `message` to send findings to each other directly. The lead doesn't have to relay.
+3. The shared task list shows who's investigating what — visible at any time with `Ctrl+T` (in-process mode) or in a tmux pane (split mode).
+4. When `code-reviewer` finds a `Promise.all` that should be sequential, it messages `security-auditor` to confirm the auth call isn't part of the race. `security-auditor` checks and replies — either confirming the race is the real issue or producing counter-evidence.
+5. `test-engineer` proposes a focused integration test for whichever theory is winning, which the team uses to verify before declaring consensus.
+6. The lead synthesizes the converged finding and presents it to you.
+
+You can interrupt at any teammate by cycling with `Shift+Down` and typing — useful for redirecting an investigator who's gone down a wrong path.
+
+### When to clean up
+
+When the investigation lands on a root cause, tell the lead:
+
+```
+Clean up the team
+```
+
+Always cleanup through the lead, not a teammate (per the docs: teammates lack full team context for cleanup).
+
+### Cost expectation
+
+Three Sonnet teammates running for ~10–15 minutes of investigation costs noticeably more than the same three personas spawned as subagents by `/ship`. The justification is *quality of conclusion* — for production debugging where the wrong fix is expensive, the extra tokens are a bargain. For a routine PR review, stick with `/ship`.
+
+### Anti-pattern in this scenario
+
+Do **not** rebuild this as a `/debug` slash command that fans out subagents. Subagents can't message each other — you'd lose the adversarial debate that makes the pattern work. If a workflow keeps coming up, document the trigger prompt above as a snippet rather than wrapping it in a slash command that misuses subagents.
+
+### When *not* to use Agent Teams
+
+- Production-bound verdict on a known diff → use `/ship` (subagents).
+- One specialist perspective on one artifact → direct persona invocation.
+- Sequential lifecycle (spec → plan → build) → user-driven slash commands (Pattern 4).
+- Read-heavy research with a small digest → built-in `Explore` subagent.
+
+Reach for Agent Teams only when teammates **need** to challenge each other to produce the right answer.
+
+---
+
+## Anti-patterns
+
+### A. Router persona ("meta-orchestrator")
+
+A persona whose job is to decide which other persona to call.
+
+```
+/work → router-persona → "this needs a review" → code-reviewer → router (paraphrases) → user
+```
+
+**Why it fails:**
+- Pure routing layer with no domain value
+- Adds two paraphrasing hops → information loss + roughly 2× token cost
+- The user already knew they wanted a review; they could have called `/review` directly
+- Replicates the work that slash commands and intent mapping in `AGENTS.md` already do
+
+**What to do instead:** add or refine slash commands. Document intent → command mapping in `AGENTS.md`.
+
+---
+
+### B. Persona that calls another persona
+
+A `code-reviewer` that internally invokes `security-auditor` when it sees auth code.
+
+**Why it fails:**
+- Personas were designed to produce a single perspective; chaining them defeats that
+- The summary the calling persona passes loses context the called persona needs
+- Failure modes multiply (which persona's output format wins? whose rules apply?)
+- Hides cost from the user
+
+**What to do instead:** have the calling persona *recommend* a follow-up audit in its report. The user or a slash command runs the second pass.
+
+---
+
+### C. Sequential orchestrator that paraphrases
+
+An agent that calls `/spec`, then `/plan`, then `/build`, etc. on the user's behalf.
+
+**Why it fails:**
+- Loses the human checkpoints that catch wrong-direction work
+- Each hand-off summarizes context — accumulated drift over a long pipeline
+- Doubles token cost: orchestrator turn + sub-agent turn for every step
+- Removes user agency at exactly the points where judgment matters most
+
+**What to do instead:** keep the user as the orchestrator. Document the recommended sequence in `README.md` and let users invoke it.
+
+---
+
+### D. Deep persona trees
+
+`/ship` calls a `pre-ship-coordinator` that calls a `quality-coordinator` that calls `code-reviewer`.
+
+**Why it fails:**
+- Each layer adds latency and tokens with no decision value
+- Debugging becomes a multi-level investigation
+- The leaf personas lose context to multiple summarization steps
+
+**What to do instead:** keep the orchestration depth at most 1 (slash command → personas). The merge happens in the main agent.
+
+---
+
+## Decision flow
+
+When considering a new orchestrated workflow, walk this flow:
+
+```
+Is the work one perspective on one artifact?
+├── Yes → Direct invocation. Stop.
+└── No  → Will the same composition repeat?
+         ├── No  → Direct invocation, ad hoc. Stop.
+         └── Yes → Are sub-tasks independent?
+                  ├── No  → Sequential slash commands run by user (Pattern 4).
+                  └── Yes → Parallel fan-out with merge (Pattern 3).
+                           Validate against the checklist above.
+                           If any check fails → fall back to single-persona command (Pattern 2).
+```
+
+---
+
+## When to add a new pattern to this catalog
+
+Add a new entry only after:
+
+1. You've used the pattern at least twice in real work
+2. You can name a concrete artifact in this repo that demonstrates it
+3. You can explain why an existing pattern wouldn't have worked
+4. You can describe its anti-pattern shadow (what people will mistakenly build instead)
+
+Premature catalog entries become aspirational documentation that no one follows.
--- a/.claude/skills/idea-refine/references/performance-checklist.md
+++ b/.claude/skills/idea-refine/references/performance-checklist.md
@@ -0,0 +1,153 @@
+# Performance Checklist
+
+Quick reference checklist for web application performance. Use alongside the `performance-optimization` skill.
+
+## Table of Contents
+
+- [Core Web Vitals Targets](#core-web-vitals-targets)
+- [TTFB Diagnosis](#ttfb-diagnosis)
+- [Frontend Checklist](#frontend-checklist)
+- [Backend Checklist](#backend-checklist)
+- [Measurement Commands](#measurement-commands)
+- [Common Anti-Patterns](#common-anti-patterns)
+
+## Core Web Vitals Targets
+
+| Metric | Good | Needs Work | Poor |
+|--------|------|------------|------|
+| LCP (Largest Contentful Paint) | ≤ 2.5s | ≤ 4.0s | > 4.0s |
+| INP (Interaction to Next Paint) | ≤ 200ms | ≤ 500ms | > 500ms |
+| CLS (Cumulative Layout Shift) | ≤ 0.1 | ≤ 0.25 | > 0.25 |
+
+## TTFB Diagnosis
+
+When TTFB is slow (> 800ms), check each component in DevTools Network waterfall:
+
+- [ ] **DNS resolution** slow → add `<link rel="dns-prefetch">` or `<link rel="preconnect">` for known origins
+- [ ] **TCP/TLS handshake** slow → enable HTTP/2, consider edge deployment, verify keep-alive
+- [ ] **Server processing** slow → profile backend, check slow queries, add caching
+
+## Frontend Checklist
+
+### Images
+- [ ] Images use modern formats (WebP, AVIF)
+- [ ] Images are responsively sized (`srcset` and `sizes`)
+- [ ] Images and `<source>` elements have explicit `width` and `height` (prevents CLS in art direction)
+- [ ] Below-the-fold images use `loading="lazy"` and `decoding="async"`
+- [ ] Hero/LCP images use `fetchpriority="high"` and no lazy loading
+
+### JavaScript
+- [ ] Bundle size under 200KB gzipped (initial load)
+- [ ] Code splitting with dynamic `import()` for routes and heavy features
+- [ ] Tree shaking enabled (verify dependency ships ESM and marks `sideEffects: false`)
+- [ ] No blocking JavaScript in `<head>` (use `defer` or `async`)
+- [ ] Heavy computation offloaded to Web Workers (if applicable)
+- [ ] `React.memo()` on expensive components that re-render with same props
+- [ ] `useMemo()` / `useCallback()` only where profiling shows benefit
+- [ ] Long tasks (> 50ms) broken up to keep the main thread available — main lever for INP
+- [ ] `yieldToMain` pattern used inside long-running loops so input events can run between chunks
+- [ ] Modern scheduling APIs used where available: `scheduler.yield()` (preferred), `scheduler.postTask()` with priorities, `isInputPending()` to yield only when needed
+- [ ] `requestIdleCallback` for deferrable, non-urgent work (analytics flush, prefetch, warmup)
+- [ ] Non-critical work deferred out of event handlers (e.g. analytics, logging) so the response to the interaction is not delayed
+- [ ] Third-party scripts loaded with `async` / `defer`, audited for size, and fronted by a facade when heavy (chat widgets, embeds)
+
+### CSS
+- [ ] Critical CSS inlined or preloaded
+- [ ] No render-blocking CSS for non-critical styles
+- [ ] No CSS-in-JS runtime cost in production (use extraction)
+
+### Fonts
+- [ ] Limited to 2–3 font families, 2–3 weights each (every additional weight is another request)
+- [ ] WOFF2 format only (smallest, universal support — skip WOFF/TTF/EOT)
+- [ ] Self-hosted when possible (third-party font CDNs add DNS + TCP + TLS round-trips)
+- [ ] LCP-critical fonts preloaded: `<link rel="preload" as="font" type="font/woff2" crossorigin>`
+- [ ] `font-display: swap` (or `optional` for non-critical) to avoid FOIT blocking render
+- [ ] Subsetted via `unicode-range` to ship only the glyphs each page needs
+- [ ] Variable fonts considered when multiple weights/styles are required (one file replaces many)
+- [ ] Fallback font metrics adjusted with `size-adjust`, `ascent-override`, `descent-override` to reduce CLS on font swap
+- [ ] System font stack considered before any custom font
+
+### Network
+- [ ] Static assets cached with long `max-age` + content hashing
+- [ ] API responses cached where appropriate (`Cache-Control`)
+- [ ] HTTP/2 or HTTP/3 enabled
+- [ ] Resources preconnected (`<link rel="preconnect">`) for known origins
+- [ ] `fetchpriority` used on critical non-image resources (e.g., key `<link rel="preload">`, above-the-fold `<script>`) — not only on `<img>`
+- [ ] No unnecessary redirects
+
+### Rendering
+- [ ] No layout thrashing (forced synchronous layouts)
+- [ ] Animations use `transform` and `opacity` (GPU-accelerated)
+- [ ] Long lists use virtualization (e.g., `react-window`)
+- [ ] No unnecessary full-page re-renders
+- [ ] Off-screen sections use `content-visibility: auto` with `contain-intrinsic-size` to skip layout/paint of non-visible areas
+- [ ] No `unload` event handlers and no `Cache-Control: no-store` on HTML responses — preserves back/forward cache (bfcache) eligibility
+
+## Backend Checklist
+
+### Database
+- [ ] No N+1 query patterns (use eager loading / joins)
+- [ ] Queries have appropriate indexes
+- [ ] List endpoints paginated (never `SELECT * FROM table`)
+- [ ] Connection pooling configured
+- [ ] Slow query logging enabled
+
+### API
+- [ ] Response times < 200ms (p95)
+- [ ] No synchronous heavy computation in request handlers
+- [ ] Bulk operations instead of loops of individual calls
+- [ ] Response compression (gzip/brotli)
+- [ ] Appropriate caching (in-memory, Redis, CDN)
+
+### Infrastructure
+- [ ] CDN for static assets
+- [ ] Server located close to users (or edge deployment)
+- [ ] Horizontal scaling configured (if needed)
+- [ ] Health check endpoint for load balancer
+
+## Measurement Commands
+
+### INP field data and DevTools workflow
+
+1. **Field data first** — check [CrUX Vis](https://developer.chrome.com/docs/crux/vis) or your RUM tool for real-user INP before optimising
+2. **Identify slow interactions** — open DevTools → Performance panel → record while interacting; look for long tasks triggered by clicks/keystrokes
+3. **Test on mid-range Android** — INP issues often only surface on slower hardware; use a real device or DevTools CPU throttling (4×–6× slowdown)
+
+```bash
+# Lighthouse CLI
+npx lighthouse https://localhost:3000 --output json --output-path ./report.json
+
+# Bundle analysis
+npx webpack-bundle-analyzer stats.json
+# or for Vite:
+npx vite-bundle-visualizer
+
+# Check bundle size
+npx bundlesize
+
+# Web Vitals in code
+import { onLCP, onINP, onCLS } from 'web-vitals';
+onLCP(console.log);
+onINP(console.log);
+onCLS(console.log);
+
+# INP with interaction-level detail (attribution build)
+import { onINP } from 'web-vitals/attribution';
+onINP(({ value, attribution }) => {
+  const { interactionTarget, inputDelay, processingDuration, presentationDelay } = attribution;
+  console.log({ value, interactionTarget, inputDelay, processingDuration, presentationDelay });
+});
+```
+
+## Common Anti-Patterns
+
+| Anti-Pattern | Impact | Fix |
+|---|---|---|
+| N+1 queries | Linear DB load growth | Use joins, includes, or batch loading |
+| Unbounded queries | Memory exhaustion, timeouts | Always paginate, add LIMIT |
+| Missing indexes | Slow reads as data grows | Add indexes for filtered/sorted columns |
+| Layout thrashing | Jank, dropped frames | Batch DOM reads, then batch writes |
+| Unoptimized images | Slow LCP, wasted bandwidth | Use WebP, responsive sizes, lazy load |
+| Large bundles | Slow Time to Interactive | Code split, tree shake, audit deps |
+| Blocking main thread | Poor INP, unresponsive UI | Chunk long tasks with `scheduler.yield()` / `yieldToMain`, offload to Web Workers |
+| Memory leaks | Growing memory, eventual crash | Clean up listeners, intervals, refs |
--- a/.claude/skills/idea-refine/references/security-checklist.md
+++ b/.claude/skills/idea-refine/references/security-checklist.md
@@ -0,0 +1,134 @@
+# Security Checklist
+
+Quick reference for web application security. Use alongside the `security-and-hardening` skill.
+
+## Table of Contents
+
+- [Pre-Commit Checks](#pre-commit-checks)
+- [Authentication](#authentication)
+- [Authorization](#authorization)
+- [Input Validation](#input-validation)
+- [Security Headers](#security-headers)
+- [CORS Configuration](#cors-configuration)
+- [Data Protection](#data-protection)
+- [Dependency Security](#dependency-security)
+- [Error Handling](#error-handling)
+- [OWASP Top 10 Quick Reference](#owasp-top-10-quick-reference)
+
+## Pre-Commit Checks
+
+- [ ] No secrets in code (`git diff --cached | grep -i "password\|secret\|api_key\|token"`)
+- [ ] `.gitignore` covers: `.env`, `.env.local`, `*.pem`, `*.key`
+- [ ] `.env.example` uses placeholder values (not real secrets)
+
+## Authentication
+
+- [ ] Passwords hashed with bcrypt (≥12 rounds), scrypt, or argon2
+- [ ] Session cookies: `httpOnly`, `secure`, `sameSite: 'lax'`
+- [ ] Session expiration configured (reasonable max-age)
+- [ ] Rate limiting on login endpoint (≤10 attempts per 15 minutes)
+- [ ] Password reset tokens: time-limited (≤1 hour), single-use
+- [ ] Account lockout after repeated failures (optional, with notification)
+- [ ] MFA supported for sensitive operations (optional but recommended)
+
+## Authorization
+
+- [ ] Every protected endpoint checks authentication
+- [ ] Every resource access checks ownership/role (prevents IDOR)
+- [ ] Admin endpoints require admin role verification
+- [ ] API keys scoped to minimum necessary permissions
+- [ ] JWT tokens validated (signature, expiration, issuer)
+
+## Input Validation
+
+- [ ] All user input validated at system boundaries (API routes, form handlers)
+- [ ] Validation uses allowlists (not denylists)
+- [ ] String lengths constrained (min/max)
+- [ ] Numeric ranges validated
+- [ ] Email, URL, and date formats validated with proper libraries
+- [ ] File uploads: type restricted, size limited, content verified
+- [ ] SQL queries parameterized (no string concatenation)
+- [ ] HTML output encoded (use framework auto-escaping)
+- [ ] URLs validated before redirect (prevent open redirect)
+
+## Security Headers
+
+```
+Content-Security-Policy: default-src 'self'; script-src 'self'
+Strict-Transport-Security: max-age=31536000; includeSubDomains
+X-Content-Type-Options: nosniff
+X-Frame-Options: DENY
+X-XSS-Protection: 0  (disabled, rely on CSP)
+Referrer-Policy: strict-origin-when-cross-origin
+Permissions-Policy: camera=(), microphone=(), geolocation=()
+```
+
+## CORS Configuration
+
+```typescript
+// Restrictive (recommended)
+cors({
+  origin: ['https://yourdomain.com', 'https://app.yourdomain.com'],
+  credentials: true,
+  methods: ['GET', 'POST', 'PUT', 'PATCH', 'DELETE'],
+  allowedHeaders: ['Content-Type', 'Authorization'],
+})
+
+// NEVER use in production:
+cors({ origin: '*' })  // Allows any origin
+```
+
+## Data Protection
+
+- [ ] Sensitive fields excluded from API responses (`passwordHash`, `resetToken`, etc.)
+- [ ] Sensitive data not logged (passwords, tokens, full CC numbers)
+- [ ] PII encrypted at rest (if required by regulation)
+- [ ] HTTPS for all external communication
+- [ ] Database backups encrypted
+
+## Dependency Security
+
+```bash
+# Audit dependencies
+npm audit
+
+# Fix automatically where possible
+npm audit fix
+
+# Check for critical vulnerabilities
+npm audit --audit-level=critical
+
+# Keep dependencies updated
+npx npm-check-updates
+```
+
+## Error Handling
+
+```typescript
+// Production: generic error, no internals
+res.status(500).json({
+  error: { code: 'INTERNAL_ERROR', message: 'Something went wrong' }
+});
+
+// NEVER in production:
+res.status(500).json({
+  error: err.message,
+  stack: err.stack,         // Exposes internals
+  query: err.sql,           // Exposes database details
+});
+```
+
+## OWASP Top 10 Quick Reference
+
+| # | Vulnerability | Prevention |
+|---|---|---|
+| 1 | Broken Access Control | Auth checks on every endpoint, ownership verification |
+| 2 | Cryptographic Failures | HTTPS, strong hashing, no secrets in code |
+| 3 | Injection | Parameterized queries, input validation |
+| 4 | Insecure Design | Threat modeling, spec-driven development |
+| 5 | Security Misconfiguration | Security headers, minimal permissions, audit deps |
+| 6 | Vulnerable Components | `npm audit`, keep deps updated, minimal deps |
+| 7 | Auth Failures | Strong passwords, rate limiting, session management |
+| 8 | Data Integrity Failures | Verify updates/dependencies, signed artifacts |
+| 9 | Logging Failures | Log security events, don't log secrets |
+| 10 | SSRF | Validate/allowlist URLs, restrict outbound requests |
--- a/.claude/skills/idea-refine/references/testing-patterns.md
+++ b/.claude/skills/idea-refine/references/testing-patterns.md
@@ -0,0 +1,236 @@
+# Testing Patterns Reference
+
+Quick reference for common testing patterns across the stack. Use alongside the `test-driven-development` skill.
+
+## Table of Contents
+
+- [Test Structure (Arrange-Act-Assert)](#test-structure-arrange-act-assert)
+- [Test Naming Conventions](#test-naming-conventions)
+- [Common Assertions](#common-assertions)
+- [Mocking Patterns](#mocking-patterns)
+- [React/Component Testing](#reactcomponent-testing)
+- [API / Integration Testing](#api--integration-testing)
+- [E2E Testing (Playwright)](#e2e-testing-playwright)
+- [Test Anti-Patterns](#test-anti-patterns)
+
+## Test Structure (Arrange-Act-Assert)
+
+```typescript
+it('describes expected behavior', () => {
+  // Arrange: Set up test data and preconditions
+  const input = { title: 'Test Task', priority: 'high' };
+
+  // Act: Perform the action being tested
+  const result = createTask(input);
+
+  // Assert: Verify the outcome
+  expect(result.title).toBe('Test Task');
+  expect(result.priority).toBe('high');
+  expect(result.status).toBe('pending');
+});
+```
+
+## Test Naming Conventions
+
+```typescript
+// Pattern: [unit] [expected behavior] [condition]
+describe('TaskService.createTask', () => {
+  it('creates a task with default pending status', () => {});
+  it('throws ValidationError when title is empty', () => {});
+  it('trims whitespace from title', () => {});
+  it('generates a unique ID for each task', () => {});
+});
+```
+
+## Common Assertions
+
+```typescript
+// Equality
+expect(result).toBe(expected);           // Strict equality (===)
+expect(result).toEqual(expected);        // Deep equality (objects/arrays)
+expect(result).toStrictEqual(expected);  // Deep equality + type matching
+
+// Truthiness
+expect(result).toBeTruthy();
+expect(result).toBeFalsy();
+expect(result).toBeNull();
+expect(result).toBeDefined();
+expect(result).toBeUndefined();
+
+// Numbers
+expect(result).toBeGreaterThan(5);
+expect(result).toBeLessThanOrEqual(10);
+expect(result).toBeCloseTo(0.3, 5);      // Floating point
+
+// Strings
+expect(result).toMatch(/pattern/);
+expect(result).toContain('substring');
+
+// Arrays / Objects
+expect(array).toContain(item);
+expect(array).toHaveLength(3);
+expect(object).toHaveProperty('key', 'value');
+
+// Errors
+expect(() => fn()).toThrow();
+expect(() => fn()).toThrow(ValidationError);
+expect(() => fn()).toThrow('specific message');
+
+// Async
+await expect(asyncFn()).resolves.toBe(value);
+await expect(asyncFn()).rejects.toThrow(Error);
+```
+
+## Mocking Patterns
+
+### Mock Functions
+
+```typescript
+const mockFn = jest.fn();
+mockFn.mockReturnValue(42);
+mockFn.mockResolvedValue({ data: 'test' });
+mockFn.mockImplementation((x) => x * 2);
+
+expect(mockFn).toHaveBeenCalled();
+expect(mockFn).toHaveBeenCalledWith('arg1', 'arg2');
+expect(mockFn).toHaveBeenCalledTimes(3);
+```
+
+### Mock Modules
+
+```typescript
+// Mock an entire module
+jest.mock('./database', () => ({
+  query: jest.fn().mockResolvedValue([{ id: 1, title: 'Test' }]),
+}));
+
+// Mock specific exports
+jest.mock('./utils', () => ({
+  ...jest.requireActual('./utils'),
+  generateId: jest.fn().mockReturnValue('test-id'),
+}));
+```
+
+### Mock at Boundaries Only
+
+```
+Mock these:                    Don't mock these:
+├── Database calls             ├── Internal utility functions
+├── HTTP requests              ├── Business logic
+├── File system operations     ├── Data transformations
+├── External API calls         ├── Validation functions
+└── Time/Date (when needed)    └── Pure functions
+```
+
+## React/Component Testing
+
+```tsx
+import { render, screen, fireEvent, waitFor } from '@testing-library/react';
+
+describe('TaskForm', () => {
+  it('submits the form with entered data', async () => {
+    const onSubmit = jest.fn();
+    render(<TaskForm onSubmit={onSubmit} />);
+
+    // Find elements by accessible role/label (not test IDs)
+    await screen.findByRole('textbox', { name: /title/i });
+    fireEvent.change(screen.getByRole('textbox', { name: /title/i }), {
+      target: { value: 'New Task' },
+    });
+    fireEvent.click(screen.getByRole('button', { name: /create/i }));
+
+    await waitFor(() => {
+      expect(onSubmit).toHaveBeenCalledWith({ title: 'New Task' });
+    });
+  });
+
+  it('shows validation error for empty title', async () => {
+    render(<TaskForm onSubmit={jest.fn()} />);
+
+    fireEvent.click(screen.getByRole('button', { name: /create/i }));
+
+    expect(await screen.findByText(/title is required/i)).toBeInTheDocument();
+  });
+});
+```
+
+## API / Integration Testing
+
+```typescript
+import request from 'supertest';
+import { app } from '../src/app';
+
+describe('POST /api/tasks', () => {
+  it('creates a task and returns 201', async () => {
+    const response = await request(app)
+      .post('/api/tasks')
+      .send({ title: 'Test Task' })
+      .set('Authorization', `Bearer ${testToken}`)
+      .expect(201);
+
+    expect(response.body).toMatchObject({
+      id: expect.any(String),
+      title: 'Test Task',
+      status: 'pending',
+    });
+  });
+
+  it('returns 422 for invalid input', async () => {
+    const response = await request(app)
+      .post('/api/tasks')
+      .send({ title: '' })
+      .set('Authorization', `Bearer ${testToken}`)
+      .expect(422);
+
+    expect(response.body.error.code).toBe('VALIDATION_ERROR');
+  });
+
+  it('returns 401 without authentication', async () => {
+    await request(app)
+      .post('/api/tasks')
+      .send({ title: 'Test' })
+      .expect(401);
+  });
+});
+```
+
+## E2E Testing (Playwright)
+
+```typescript
+import { test, expect } from '@playwright/test';
+
+test('user can create and complete a task', async ({ page }) => {
+  // Navigate and authenticate
+  await page.goto('/');
+  await page.fill('[name="email"]', 'test@example.com');
+  await page.fill('[name="password"]', 'testpass123');
+  await page.click('button:has-text("Log in")');
+
+  // Create a task
+  await page.click('button:has-text("New Task")');
+  await page.fill('[name="title"]', 'Buy groceries');
+  await page.click('button:has-text("Create")');
+
+  // Verify task appears
+  await expect(page.locator('text=Buy groceries')).toBeVisible();
+
+  // Complete the task
+  await page.click('[aria-label="Complete Buy groceries"]');
+  await expect(page.locator('text=Buy groceries')).toHaveCSS(
+    'text-decoration-line', 'line-through'
+  );
+});
+```
+
+## Test Anti-Patterns
+
+| Anti-Pattern | Problem | Better Approach |
+|---|---|---|
+| Testing implementation details | Breaks on refactor | Test inputs/outputs |
+| Snapshot everything | No one reviews snapshot diffs | Assert specific values |
+| Shared mutable state | Tests pollute each other | Setup/teardown per test |
+| Testing third-party code | Wastes time, not your bug | Mock the boundary |
+| Skipping tests to pass CI | Hides real bugs | Fix or delete the test |
+| Using `test.skip` permanently | Dead code | Remove or fix it |
+| Overly broad assertions | Doesn't catch regressions | Be specific |
+| No async error handling | Swallowed errors, false passes | Always `await` async tests |
--- a/.claude/skills/idea-refine/refinement-criteria.md
+++ b/.claude/skills/idea-refine/refinement-criteria.md
@@ -0,0 +1,113 @@
+# Refinement & Evaluation Criteria
+
+Use this rubric during Phase 2 (Evaluate & Converge) to stress-test idea directions. Not every criterion applies to every idea — use judgment about which dimensions matter most for the specific context.
+
+## Core Evaluation Dimensions
+
+### 1. User Value
+
+The most important dimension. If the value isn't clear, nothing else matters.
+
+**Painkiller vs. Vitamin:**
+- **Painkiller:** Solves an acute, frequent problem. Users will actively seek this out. They'll switch from their current solution. Signs: people describe the problem with emotion, they've built workarounds, they'll pay for a solution.
+- **Vitamin:** Nice to have. Makes something marginally better. Users won't go out of their way. Signs: people nod politely, say "that's cool," then don't change behavior.
+
+**Questions to ask:**
+- Can you name 3 specific people who have this problem right now?
+- What are they doing today instead? (The real competitor is always the current workaround.)
+- Would they switch from their current approach? What would make them switch?
+- How often do they encounter this problem? (Daily problems > monthly problems)
+- Is this a "pull" problem (users are asking for this) or a "push" problem (you think they should want this)?
+
+**Red flags:**
+- "Everyone could use this" — if you can't name a specific user, the value isn't clear
+- "It's like X but better" — marginal improvements rarely drive adoption
+- The problem is real but rare — high intensity but low frequency rarely justifies a product
+
+### 2. Feasibility
+
+Can you actually build this? Not just technically, but practically.
+
+**Technical feasibility:**
+- Does the core technology exist and work reliably?
+- What's the hardest technical problem? Is it a known-hard problem or a novel one?
+- Are there dependencies on third parties, APIs, or data sources you don't control?
+- What's the minimum technical stack needed? (If the answer is "a lot," that's a signal.)
+
+**Resource feasibility:**
+- What's the minimum team/effort to build an MVP?
+- Does it require specialized expertise you don't have?
+- Are there regulatory, legal, or compliance requirements?
+
+**Time-to-value:**
+- How quickly can you get something in front of users?
+- Is there a version that delivers value in days/weeks, not months?
+- What's the critical path? What has to happen first?
+
+**Red flags:**
+- "We just need to solve [very hard research problem] first"
+- Multiple dependencies that all need to work simultaneously
+- MVP still requires months of work — likely not minimal enough
+
+### 3. Differentiation
+
+What makes this genuinely different? Not better — *different*.
+
+**Questions to ask:**
+- If a user described this to a friend, what would they say? Is that description compelling?
+- What's the one thing this does that nothing else does? (If you can't name one, that's a problem.)
+- Is this differentiation durable? Can a competitor copy it in a week?
+- Is the difference something users actually care about, or just something builders find interesting?
+
+**Types of differentiation (strongest to weakest):**
+1. **New capability:** Does something that was previously impossible
+2. **10x improvement:** So much better on a key dimension that it changes behavior
+3. **New audience:** Brings an existing capability to people who were excluded
+4. **New context:** Works in a situation where existing solutions fail
+5. **Better UX:** Same capability, dramatically simpler experience
+6. **Cheaper:** Same thing, lower cost (weakest — easily competed away)
+
+**Red flags:**
+- Differentiation is entirely about technology, not user experience
+- "We're faster/cheaper/prettier" without a structural reason why
+- The feature that differentiates is not the feature users care most about
+
+## Assumption Audit
+
+For every idea direction, explicitly list assumptions in three categories:
+
+### Must Be True (Dealbreakers)
+Assumptions that, if wrong, kill the idea entirely. These need validation before building.
+
+Example: "Users will share their data with us" — if they won't, the entire product doesn't work.
+
+### Should Be True (Important)
+Assumptions that significantly impact success but don't kill the idea. You can adjust the approach if these are wrong.
+
+Example: "Users prefer self-serve over talking to a person" — if wrong, you need a different go-to-market, but the core product can still work.
+
+### Might Be True (Nice to Have)
+Assumptions about secondary features or optimizations. Don't validate these until the core is proven.
+
+Example: "Users will want to share their results with teammates" — a growth feature, not a core value proposition.
+
+## Decision Framework
+
+When choosing between directions, rank on this matrix:
+
+|                    | High Feasibility | Low Feasibility |
+|--------------------|-------------------|-----------------|
+| **High Value**     | Do this first     | Worth the risk   |
+| **Low Value**      | Only if trivial   | Don't do this    |
+
+Then use differentiation as the tiebreaker between options in the same quadrant.
+
+## MVP Scoping Principles
+
+When defining MVP scope for the chosen direction:
+
+1. **One job, done well.** The MVP should nail exactly one user job. Not three jobs done partially.
+2. **The riskiest assumption first.** The MVP's primary purpose is to test the assumption most likely to be wrong.
+3. **Time-box, not feature-list.** "What can we build and test in [timeframe]?" is better than "What features do we need?"
+4. **The 'Not Doing' list is mandatory.** Explicitly name what you're cutting and why. This prevents scope creep and forces honest prioritization.
+5. **If it's not embarrassing, you waited too long.** The first version should feel incomplete to the builder. If it doesn't, you over-built.
--- a/.claude/skills/idea-refine/scripts/idea-refine.sh
+++ b/.claude/skills/idea-refine/scripts/idea-refine.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+set -e
+
+# This script helps initialize the ideas directory for the idea-refine skill.
+
+IDEAS_DIR="docs/ideas"
+
+if [ ! -d "$IDEAS_DIR" ]; then
+  mkdir -p "$IDEAS_DIR"
+  echo "Created directory: $IDEAS_DIR" >&2
+else
+  echo "Directory already exists: $IDEAS_DIR" >&2
+fi
+
+echo "{\"status\": \"ready\", \"directory\": \"$IDEAS_DIR\"}"