The Agentic AI Product Management Field Guide

Chapter 10: Lumen W4: Feature Validation

Command: /lumen:feature Best for: PMs deciding whether to build, buy, or test a specific feature. PostHog: Not required. Recommended for behavioral evidence on the target use case.

W4 validates a feature idea before you commit engineering resources. It runs a behavioral evidence check, an AI ethics checkpoint (for AI or ML features), customer interviews to test the assumption, a power-calculated experiment design, an interaction design spec, and a build/buy/test decision.

The output is a decision memo that tells you what to do and an experiment brief that tells you how to test it first.

When to Run It

Run W4 when:

You have a feature idea and want to validate the assumption before scoping it
You are deciding whether to build an AI or ML feature (ethics checkpoint required)
You need a build vs. buy analysis for a capability your product does not have
You want to run a fake-door test or concierge MVP before committing to a full build
A feature shipped but you want to design a rigorous experiment before scaling it

What You Need Before Starting

Required:

A clear description of the feature: what it does, who it is for, and what problem it solves
The target segment

Strongly recommended:

Any existing behavioral evidence from PostHog (usage signals, activation patterns)
Whether the feature uses AI or ML (DataLayer triggers only for AI/ML features)

Optional:

Figma files or wireframes (UXLayer can read them directly if Figma is connected)
Prior interview data about the specific pain this feature addresses

Sample Prompt

/lumen:feature
Product: Helix , B2B SaaS PM platform
Feature: "AI Sprint Advisor" , analyses the team's current sprint backlog and
          suggests reprioritization based on OKR alignment, engineering velocity,
          and historical completion rates.
Target users: Team plan users (product teams with 3–10 engineers)
Uses AI/ML: Yes , LLM-based suggestion engine reading backlog and OKR data

Current evidence: We know from W1 that roadmap-to-sprint handoff is the #1 opportunity. We have no behavioral data yet on how teams structure their sprint backlog in Helix.

Key question: Should we build this now, run a fake-door test first, or buy a third-party API? Flag any ethics requirements for the AI component.

Constraint: We have 4 engineering weeks available if we decide to build.

The Agent Sequence

SetupGuide → MCP check + context seed

EventIQ → [parallel] behavioural evidence check

SignalMonitor → [parallel] feature usage signals

DataLayer → AI ethics checkpoint [Level 3 gate] , runs ONLY for AI/ML features

DiscoveryOS → 3–5 targeted interviews (skipped if evidence already validated)

HypothesisLab → experiment design, power calculation

UXLayer → interaction design spec [Level 1]

DecideWell → build/buy/test decision [Level 1]

DataLayer → experiment bias audit [Level 3 , only if ethics cleared at Checkpoint 1]

Orchestrator → W4 Feature Validation Report

Terminal Output Walkthrough

[LUMEN] EventIQ + SignalMonitor running in parallel...

✓ Behavioral evidence check complete

sprint_backlog_update events: 1,840 total · avg 3.2 events/user/week

No existing sprint prioritization events in schema

Recommendation: Add sprint_priority_change event before experiment launch

[LUMEN] DataLayer · AI Ethics Checkpoint 1...

⚠ AI/ML feature detected , ethics review required

Assessing: AI Sprint Advisor (LLM-based sprint reprioritization)

GDPR jurisdiction: EU data handling rules apply

Issues identified:

1. LLM reads OKR and backlog data , may process personal data (task ownership)

2. Automated reprioritization suggestions could affect employee workload

3. Model training data source not specified , bias risk unassessed

Once the named approver responds:

/lumen:approve
> W4-ethics-ai-sprint-advisor-20260313
> CONDITIONAL
[LUMEN] CONDITIONAL clearance recorded.
        Conditions must be met before production rollout (>100 teams).
        Experiment for <100 teams is permitted.
        Continuing W4 workflow.

[LUMEN] DiscoveryOS · 3 targeted interviews...

✓ 3 interviews conducted on sprint planning pain

Finding: All 3 participants reprioritize manually every Monday (avg 40 min)

Finding: Primary frustration is re-aligning sprint to last-minute OKR changes

Finding: 2/3 participants worried about AI "overriding their judgment" ,

human-in-the-loop design confirmed as critical

[LUMEN] HypothesisLab · Experiment design...

✓ Hypothesis: Showing AI-generated reprioritization suggestions (advisory only)

will reduce Monday sprint planning time and improve OKR alignment

Recommended test: Fake-door + concierge MVP

Phase 1 (2 weeks): Show "Optimize Sprint" button to 50% of Team plan users.

Measure click-through rate. Success: ≥ 22% CTR.

Phase 2 (4 weeks): Manually generate suggestions for users who click.

Measure time-to-first-sprint-lock and OKR alignment score.

Sample size: n=480 Team plan users · Power: 80% · MDE: 8% improvement

[LUMEN] UXLayer · Interaction design spec...

✓ Spec generated

Pattern: Progressive disclosure , suggestions appear collapsed,

user expands to review before applying

Accessibility: WCAG AA , keyboard navigation required for suggestion cards

⚠ Flag: Do not show individual engineer names in AI suggestions ,

preserves data minimization condition from DataLayer clearance

[LUMEN] DecideWell · Build/buy/test decision...

✓ Recommendation: RUN FAKE-DOOR TEST FIRST

Build: High confidence in user need; ethics conditions manageable;

4 engineering weeks is sufficient for MVP

Risk: Low click-through on fake-door would save 4 weeks of build time

Verdict: Run Phase 1 fake-door (2 weeks, ~0.5 engineering days to implement).

If CTR ≥ 22%: proceed to Phase 2 concierge, then build.

If CTR < 22%: do not build. Revisit in Q4.

Reading the W4 Report

Ethics Clearance Status , If DataLayer ran, the clearance status (GRANTED / CONDITIONAL / BLOCKED) appears at the top of the report. CONDITIONAL means you can proceed with restrictions. BLOCKED means you cannot proceed until the feature is redesigned. Never skip this section.

Behavioral Evidence , What EventIQ and SignalMonitor found in your existing data. This tells you whether there is already behavioral signal supporting the feature need before a single interview.

Experiment Brief , The hypothesis, the phased test plan, the sample size, and the success criteria. The fake-door phase (if recommended) is the most important section , it tells you the cheapest possible way to validate demand before committing to a build.

Interaction Spec , UXLayer's output. It covers the interaction pattern, accessibility flags, and trust calibration for AI features. For AI features, the spec will always include a human-in-the-loop pattern. This is non-negotiable.

Decision Memo , Build / buy / test with rationale. The memo includes the outcome tracking ID for the 30-day follow-up.

The Ethics Checkpoint: A Practical Note

DataLayer runs for every feature that includes AI or ML. This is not optional and cannot be skipped.

The checkpoint takes 72 hours when a named approver is required. Plan for this. If you are on a tight timeline, start the ethics checkpoint before the rest of the workflow by describing the AI component clearly in your prompt.

A CONDITIONAL clearance is the most common outcome. It means you can ship to a limited audience (typically under 100 users) without full compliance sign-off, as long as you meet the stated conditions. Conditions almost always include human-in-the-loop design and data minimization.

A BLOCKED status means the feature, as designed, cannot proceed. This is rare, but it happens. The most common cause is a feature that processes personal data without a legal basis under GDPR, or an AI feature that makes automated decisions affecting individuals without a right of appeal.

Common Mistakes

Skipping the fake-door test. DecideWell often recommends a fake door before committing to a full build. Teams skip this because it feels like a delay. It is not delayed. It is the cheapest possible signal on whether anyone actually wants the feature.

Not flagging that a feature uses AI. If you do not mention that a feature uses an LLM or ML model in your prompt, DataLayer will not run. You will ship an AI feature without an ethics review. This is a compliance and trust risk, not just a Lumen workflow issue.

Treating CONDITIONAL as GRANTED. A CONDITIONAL clearance has conditions attached. Those conditions are not optional. If you ship to production without meeting them, you are in violation of the clearance.

Running W4 without a specific feature description. "We want to improve activation" is not a feature. W4 is designed for a specific, scoped feature with a clear user and a clear job-to-be-done. Vague inputs produce vague experiment designs.