Chapter 10: Lumen W4: Feature Validation
Command: /lumen:feature Best for: PMs deciding whether to build, buy, or test a specific feature. PostHog: Not required. Recommended for behavioral evidence on the target use case.
W4 validates a feature idea before you commit engineering resources. It runs a behavioral evidence check, an AI ethics checkpoint (for AI or ML features), customer interviews to test the assumption, a power-calculated experiment design, an interaction design spec, and a build/buy/test decision.
The output is a decision memo that tells you what to do and an experiment brief that tells you how to test it first.
When to Run It
Run W4 when:
- You have a feature idea and want to validate the assumption before scoping it
- You are deciding whether to build an AI or ML feature (ethics checkpoint required)
- You need a build vs. buy analysis for a capability your product does not have
- You want to run a fake-door test or concierge MVP before committing to a full build
- A feature shipped but you want to design a rigorous experiment before scaling it
What You Need Before Starting
Required:
- A clear description of the feature: what it does, who it is for, and what problem it solves
- The target segment
Strongly recommended:
- Any existing behavioral evidence from PostHog (usage signals, activation patterns)
- Whether the feature uses AI or ML (DataLayer triggers only for AI/ML features)
Optional:
- Figma files or wireframes (UXLayer can read them directly if Figma is connected)
- Prior interview data about the specific pain this feature addresses
Sample Prompt
/lumen:feature
Product: Helix — B2B SaaS PM platform
Feature: "AI Sprint Advisor" — analyses the team's current sprint backlog and
suggests reprioritization based on OKR alignment, engineering velocity,
and historical completion rates.
Target users: Team plan users (product teams with 3–10 engineers)
Uses AI/ML: Yes — LLM-based suggestion engine reading backlog and OKR dataCurrent evidence: We know from W1 that roadmap-to-sprint handoff is the #1 opportunity. We have no behavioral data yet on how teams structure their sprint backlog in Helix.
Key question: Should we build this now, run a fake-door test first, or buy a third-party API? Flag any ethics requirements for the AI component.
Constraint: We have 4 engineering weeks available if we decide to build.
The Agent Sequence
SetupGuide → MCP check + context seed
EventIQ → [parallel] behavioural evidence check
SignalMonitor → [parallel] feature usage signals
DataLayer → AI ethics checkpoint [Level 3 gate] — runs ONLY for AI/ML features
DiscoveryOS → 3–5 targeted interviews (skipped if evidence already validated)
HypothesisLab → experiment design, power calculation
UXLayer → interaction design spec [Level 1]
DecideWell → build/buy/test decision [Level 1]
DataLayer → experiment bias audit [Level 3 — only if ethics cleared at Checkpoint 1]
Orchestrator → W4 Feature Validation Report
Terminal Output Walkthrough
[LUMEN] EventIQ + SignalMonitor running in parallel...
✓ Behavioral evidence check complete
sprint_backlog_update events: 1,840 total · avg 3.2 events/user/week
No existing sprint prioritization events in schema
Recommendation: Add sprint_priority_change event before experiment launch
[LUMEN] DataLayer · AI Ethics Checkpoint 1...
⚠ AI/ML feature detected — ethics review required
Assessing: AI Sprint Advisor (LLM-based sprint reprioritization)
GDPR jurisdiction: EU data handling rules apply
Issues identified:
1. LLM reads OKR and backlog data — may process personal data (task ownership)
2. Automated reprioritization suggestions could affect employee workload
3. Model training data source not specified — bias risk unassessed

Once the named approver responds:
/lumen:approve
> W4-ethics-ai-sprint-advisor-20260313
> CONDITIONAL
[LUMEN] CONDITIONAL clearance recorded.
Conditions must be met before production rollout (>100 teams).
Experiment for <100 teams is permitted.
Continuing W4 workflow.[LUMEN] DiscoveryOS · 3 targeted interviews...
✓ 3 interviews conducted on sprint planning pain
Finding: All 3 participants reprioritize manually every Monday (avg 40 min)
Finding: Primary frustration is re-aligning sprint to last-minute OKR changes
Finding: 2/3 participants worried about AI "overriding their judgment" —
human-in-the-loop design confirmed as critical
[LUMEN] HypothesisLab · Experiment design...
✓ Hypothesis: Showing AI-generated reprioritization suggestions (advisory only)
will reduce Monday sprint planning time and improve OKR alignment
Recommended test: Fake-door + concierge MVP
Phase 1 (2 weeks): Show "Optimize Sprint" button to 50% of Team plan users.
Measure click-through rate. Success: ≥ 22% CTR.
Phase 2 (4 weeks): Manually generate suggestions for users who click.
Measure time-to-first-sprint-lock and OKR alignment score.
Sample size: n=480 Team plan users · Power: 80% · MDE: 8% improvement
[LUMEN] UXLayer · Interaction design spec...
✓ Spec generated
Pattern: Progressive disclosure — suggestions appear collapsed,
user expands to review before applying
Accessibility: WCAG AA — keyboard navigation required for suggestion cards
⚠ Flag: Do not show individual engineer names in AI suggestions —
preserves data minimization condition from DataLayer clearance
[LUMEN] DecideWell · Build/buy/test decision...
✓ Recommendation: RUN FAKE-DOOR TEST FIRST
Build: High confidence in user need; ethics conditions manageable;
4 engineering weeks is sufficient for MVP
Risk: Low click-through on fake-door would save 4 weeks of build time
Verdict: Run Phase 1 fake-door (2 weeks, ~0.5 engineering days to implement).
If CTR ≥ 22%: proceed to Phase 2 concierge, then build.
If CTR < 22%: do not build. Revisit in Q4.
Reading the W4 Report
Ethics Clearance Status — If DataLayer ran, the clearance status (GRANTED / CONDITIONAL / BLOCKED) appears at the top of the report. CONDITIONAL means you can proceed with restrictions. BLOCKED means you cannot proceed until the feature is redesigned. Never skip this section.
Behavioral Evidence — What EventIQ and SignalMonitor found in your existing data. This tells you whether there is already behavioral signal supporting the feature need before a single interview.
Experiment Brief — The hypothesis, the phased test plan, the sample size, and the success criteria. The fake-door phase (if recommended) is the most important section — it tells you the cheapest possible way to validate demand before committing to a build.
Interaction Spec — UXLayer's output. It covers the interaction pattern, accessibility flags, and trust calibration for AI features. For AI features, the spec will always include a human-in-the-loop pattern. This is non-negotiable.
Decision Memo — Build / buy / test with rationale. The memo includes the outcome tracking ID for the 30-day follow-up.
The Ethics Checkpoint: A Practical Note
DataLayer runs for every feature that includes AI or ML. This is not optional and cannot be skipped.
The checkpoint takes 72 hours when a named approver is required. Plan for this. If you are on a tight timeline, start the ethics checkpoint before the rest of the workflow by describing the AI component clearly in your prompt.
A CONDITIONAL clearance is the most common outcome. It means you can ship to a limited audience (typically under 100 users) without full compliance sign-off, as long as you meet the stated conditions. Conditions almost always include human-in-the-loop design and data minimization.
A BLOCKED status means the feature, as designed, cannot proceed. This is rare, but it happens. The most common cause is a feature that processes personal data without a legal basis under GDPR, or an AI feature that makes automated decisions affecting individuals without a right of appeal.
Common Mistakes
Skipping the fake-door test. DecideWell often recommends a fake door before committing to a full build. Teams skip this because it feels like a delay. It is not delayed. It is the cheapest possible signal on whether anyone actually wants the feature.
Not flagging that a feature uses AI. If you do not mention that a feature uses an LLM or ML model in your prompt, DataLayer will not run. You will ship an AI feature without an ethics review. This is a compliance and trust risk, not just a Lumen workflow issue.
Treating CONDITIONAL as GRANTED. A CONDITIONAL clearance has conditions attached. Those conditions are not optional. If you ship to production without meeting them, you are in violation of the clearance.
Running W4 without a specific feature description. "We want to improve activation" is not a feature. W4 is designed for a specific, scoped feature with a clear user and a clear job-to-be-done. Vague inputs produce vague experiment designs.