Skip to main content

Chapter 10: Lumen W4: Feature Validation

Chapter 10: Lumen W4: Feature Validation

Command: /lumen:feature Best for: PMs deciding whether to build, buy, or test a specific feature. PostHog: Not required. Recommended for behavioral evidence on the target use case.

W4 validates a feature idea before you commit engineering resources. It runs a behavioral evidence check, an AI ethics checkpoint (for AI or ML features), customer interviews to test the assumption, a power-calculated experiment design, an interaction design spec, and a build/buy/test decision.

The output is a decision memo that tells you what to do and an experiment brief that tells you how to test it first.

When to Run It

Run W4 when:

  • You have a feature idea and want to validate the assumption before scoping it
  • You are deciding whether to build an AI or ML feature (ethics checkpoint required)
  • You need a build vs. buy analysis for a capability your product does not have
  • You want to run a fake-door test or concierge MVP before committing to a full build
  • A feature shipped but you want to design a rigorous experiment before scaling it

What You Need Before Starting

Required:

  • A clear description of the feature: what it does, who it is for, and what problem it solves
  • The target segment

Strongly recommended:

  • Any existing behavioral evidence from PostHog (usage signals, activation patterns)
  • Whether the feature uses AI or ML (DataLayer triggers only for AI/ML features)

Optional:

  • Figma files or wireframes (UXLayer can read them directly if Figma is connected)
  • Prior interview data about the specific pain this feature addresses

Sample Prompt

/lumen:feature
Product: Helix — B2B SaaS PM platform
Feature: "AI Sprint Advisor" — analyses the team's current sprint backlog and
          suggests reprioritization based on OKR alignment, engineering velocity,
          and historical completion rates.
Target users: Team plan users (product teams with 3–10 engineers)
Uses AI/ML: Yes — LLM-based suggestion engine reading backlog and OKR data

Current evidence: We know from W1 that roadmap-to-sprint handoff is the #1 opportunity. We have no behavioral data yet on how teams structure their sprint backlog in Helix.

Key question: Should we build this now, run a fake-door test first, or buy a third-party API? Flag any ethics requirements for the AI component.

Constraint: We have 4 engineering weeks available if we decide to build.

The Agent Sequence

SetupGuide     → MCP check + context seed

EventIQ        → [parallel] behavioural evidence check

SignalMonitor  → [parallel] feature usage signals

DataLayer      → AI ethics checkpoint [Level 3 gate] — runs ONLY for AI/ML features

DiscoveryOS    → 3–5 targeted interviews (skipped if evidence already validated)

HypothesisLab  → experiment design, power calculation

UXLayer        → interaction design spec [Level 1]

DecideWell     → build/buy/test decision [Level 1]

DataLayer      → experiment bias audit [Level 3 — only if ethics cleared at Checkpoint 1]

Orchestrator   → W4 Feature Validation Report

Terminal Output Walkthrough

[LUMEN] EventIQ + SignalMonitor running in parallel...

✓ Behavioral evidence check complete

  sprint_backlog_update events: 1,840 total · avg 3.2 events/user/week

  No existing sprint prioritization events in schema

  Recommendation: Add sprint_priority_change event before experiment launch

[LUMEN] DataLayer · AI Ethics Checkpoint 1...

⚠ AI/ML feature detected — ethics review required

Assessing: AI Sprint Advisor (LLM-based sprint reprioritization)

GDPR jurisdiction: EU data handling rules apply

Issues identified:

  1. LLM reads OKR and backlog data — may process personal data (task ownership)

  2. Automated reprioritization suggestions could affect employee workload

  3. Model training data source not specified — bias risk unassessed

Once the named approver responds:

/lumen:approve
> W4-ethics-ai-sprint-advisor-20260313
> CONDITIONAL
[LUMEN] CONDITIONAL clearance recorded.
        Conditions must be met before production rollout (>100 teams).
        Experiment for <100 teams is permitted.
        Continuing W4 workflow.

[LUMEN] DiscoveryOS · 3 targeted interviews...

✓ 3 interviews conducted on sprint planning pain

  Finding: All 3 participants reprioritize manually every Monday (avg 40 min)

  Finding: Primary frustration is re-aligning sprint to last-minute OKR changes

  Finding: 2/3 participants worried about AI "overriding their judgment" —

           human-in-the-loop design confirmed as critical

[LUMEN] HypothesisLab · Experiment design...

✓ Hypothesis: Showing AI-generated reprioritization suggestions (advisory only)

              will reduce Monday sprint planning time and improve OKR alignment

  Recommended test: Fake-door + concierge MVP

    Phase 1 (2 weeks): Show "Optimize Sprint" button to 50% of Team plan users.

                       Measure click-through rate. Success: ≥ 22% CTR.

    Phase 2 (4 weeks): Manually generate suggestions for users who click.

                       Measure time-to-first-sprint-lock and OKR alignment score.

    Sample size: n=480 Team plan users · Power: 80% · MDE: 8% improvement

[LUMEN] UXLayer · Interaction design spec...

✓ Spec generated

  Pattern: Progressive disclosure — suggestions appear collapsed,

           user expands to review before applying

  Accessibility: WCAG AA — keyboard navigation required for suggestion cards

⚠ Flag: Do not show individual engineer names in AI suggestions —

        preserves data minimization condition from DataLayer clearance

[LUMEN] DecideWell · Build/buy/test decision...

✓ Recommendation: RUN FAKE-DOOR TEST FIRST

    Build:  High confidence in user need; ethics conditions manageable;

            4 engineering weeks is sufficient for MVP

    Risk:   Low click-through on fake-door would save 4 weeks of build time

    Verdict: Run Phase 1 fake-door (2 weeks, ~0.5 engineering days to implement).

             If CTR ≥ 22%: proceed to Phase 2 concierge, then build.

             If CTR < 22%: do not build. Revisit in Q4.

Reading the W4 Report

Ethics Clearance Status — If DataLayer ran, the clearance status (GRANTED / CONDITIONAL / BLOCKED) appears at the top of the report. CONDITIONAL means you can proceed with restrictions. BLOCKED means you cannot proceed until the feature is redesigned. Never skip this section.

Behavioral Evidence — What EventIQ and SignalMonitor found in your existing data. This tells you whether there is already behavioral signal supporting the feature need before a single interview.

Experiment Brief — The hypothesis, the phased test plan, the sample size, and the success criteria. The fake-door phase (if recommended) is the most important section — it tells you the cheapest possible way to validate demand before committing to a build.

Interaction Spec — UXLayer's output. It covers the interaction pattern, accessibility flags, and trust calibration for AI features. For AI features, the spec will always include a human-in-the-loop pattern. This is non-negotiable.

Decision Memo — Build / buy / test with rationale. The memo includes the outcome tracking ID for the 30-day follow-up.

The Ethics Checkpoint: A Practical Note

DataLayer runs for every feature that includes AI or ML. This is not optional and cannot be skipped.

The checkpoint takes 72 hours when a named approver is required. Plan for this. If you are on a tight timeline, start the ethics checkpoint before the rest of the workflow by describing the AI component clearly in your prompt.

A CONDITIONAL clearance is the most common outcome. It means you can ship to a limited audience (typically under 100 users) without full compliance sign-off, as long as you meet the stated conditions. Conditions almost always include human-in-the-loop design and data minimization.

A BLOCKED status means the feature, as designed, cannot proceed. This is rare, but it happens. The most common cause is a feature that processes personal data without a legal basis under GDPR, or an AI feature that makes automated decisions affecting individuals without a right of appeal.

Common Mistakes

Skipping the fake-door test. DecideWell often recommends a fake door before committing to a full build. Teams skip this because it feels like a delay. It is not delayed. It is the cheapest possible signal on whether anyone actually wants the feature.

Not flagging that a feature uses AI. If you do not mention that a feature uses an LLM or ML model in your prompt, DataLayer will not run. You will ship an AI feature without an ethics review. This is a compliance and trust risk, not just a Lumen workflow issue.

Treating CONDITIONAL as GRANTED. A CONDITIONAL clearance has conditions attached. Those conditions are not optional. If you ship to production without meeting them, you are in violation of the clearance.

Running W4 without a specific feature description. "We want to improve activation" is not a feature. W4 is designed for a specific, scoped feature with a clear user and a clear job-to-be-done. Vague inputs produce vague experiment designs.