Skip to main content

How to Build Products Where AI Is the Foundation, Not Just Decoration

Whether AI features in your product will fail or succeed is determined by whether you have bolted them on or built them in. Learn how to build AI-first products where intelligence flows through every interaction—from prompts as PRDs to autonomous agents and NLX design.

Ishwar Jha

When ChatGPT exploded in late 2022, suddenly every Product Manager was called for an emergency meeting with an agenda centred on “What’s our AI Strategy.” Investors, the executive team, and every other stakeholder started asking how competitors were shipping “AI-Powered” products at such a rapid speed and why we aren’t there yet.

The pressure was immense and immediate. Quarterly planning cycles that once focused on incremental improvements are now centred around revolutionary AI capabilities. PMs faced an impossible choice: spend months rebuilding products around AI, or find ways to add AI features to existing roadmaps.

Most PMs chose the path of least resistance. They treated AI like any other third-party integration, something that could be plugged into existing architectures without fundamental changes. The reasoning seemed sound: we already have users, workflows, and data. Why not enhance what's working with some AI magic?

This approach felt safe and practical. Adding a chatbot to customer support pages required minimal engineering resources. Integrating machine learning into recommendation engines can occur alongside other feature development. Teams could ship "AI-powered" features within existing sprint cycles and satisfy stakeholder pressure without disrupting core product development.

The add-on approach also followed familiar patterns. Product managers knew how to evaluate, integrate, and measure traditional software features. AI add-ons fit into existing frameworks for vendor evaluation, user testing, and success metrics. It felt like evolution rather than revolution.

However, this approach often missed the point entirely and backfired spectacularly. The age of slapping AI onto existing products is over. 

Look at the graveyard of AI add-ons that users abandoned within months. Google+ introduced AI-powered photo tagging, which frequently misidentified people, resulting in embarrassment rather than value. 

Countless e-commerce sites added "AI-powered search" that returned worse results than basic keyword matching because the AI wasn't trained on their specific product catalogue.

The add-on approach is failing for predictable reasons. These features feel tacked on because they are tacked on. They don't integrate with existing user mental models or workflows. They require users to learn new interfaces for marginal benefits. Most critically, they can't access the context and data needed to be genuinely helpful.

Compare this to products built AI-first from the ground up. Midjourney didn't add AI to photo editing software; it reimagined creative workflows around AI generation. Linear didn't bolt AI onto project management; it designed AI-native issue tracking that understands engineering workflows. These products feel cohesive because intelligence is seamlessly integrated into every interaction.

From Edge Cases to Core Architecture

Traditional product development relegates AI to specific features. Customer service gets a chatbot that can't access order history. Search provides more intelligent suggestions that take into account user preferences, context and relevance. Analytics dashboards get predictive insights that contradict other system recommendations. Each AI implementation exists in isolation, fighting for resources and attention.

This fragmented approach creates jarring user experiences. When I sat for a demo of Salesforce's Einstein AI during an event, it felt bolted on because it lived in separate tabs from the core CRM workflows. Users had to context-switch between "normal Salesforce" and "AI Salesforce," disrupting the natural workflow. The AI couldn't see what users were doing in the main interface, so its suggestions often felt irrelevant or redundant.

Slack's early AI features suffered similar problems. Smart suggestions for channels appeared in one place, message summarisation in another, and automated responses in a third. Users couldn't build muscle memory because AI help appeared inconsistently across the product.

Consider how Notion evolved beyond the add-on trap. Early Notion AI felt like a writing assistant bolted onto a notes app. However, the Notion team was quick to realise their mistakes and regain the ground, understanding how AI flows through every database query, every template suggestion, and every content organisation decision. If you feel that your user should be considerate about "using AI" as a distinct concept, it’s your misconception. Users want to continue working as usual, and the intelligence layer seamlessly anticipates their needs and responds accordingly.

Compare GitHub Copilot's trajectory. Instead of adding AI suggestions to existing Visual Studio features, they integrated AI directly into the code editor, where developers actually work, allowing AI to work hand in hand with developers.

This architectural shift by Notion and GitHub establishes the emergence of new thinking. Instead of asking "Where can we add AI?" the question becomes "How does intelligence flow through our entire system?" The difference determines whether your AI feels helpful or half-baked.

Welcome to building AI-first products.

Your New Role as a Capability Orchestrator

Your role as a PM just expanded beyond managing features. You're now orchestrating AI capabilities across your entire product ecosystem.

As a feature manager, your role was centred around optimising individual components. You were worried about button placement, user flows, and A/B test results. Now, as an AI capability, orchestrators, you need to think in systems. You need to understand how different AI models interact, where human judgment remains essential, and how to design seamless handoffs between artificial and human intelligence.

At LinkedIn,  Tomer Cohen learned this lesson early. He explains: "AI arguably is the biggest technological revolution in our lifetimes. When I say AI first, it's not about a tech, it's a mindset. It starts with strategy." He compares it to river rafting: "You have everybody on the sides holding the pedals, but there's the guide on the back holding those two pedals. Those two pedals are AI, and the guide better be you."

This means learning to speak machine learning. Not the mathematical details, but the practical constraints. You need to understand model training cycles, data requirements, and performance trade-offs sufficiently to make informed strategic decisions.

The best AI-first PMs become translators. They bridge the gap between engineering teams building complex systems and users who just want their problems solved. They ask questions traditional PMs delegated to engineering: What is the objective of our algorithm? What features have we added to the algorithm? What investment do we have in data collection and fine-tuning?

Cohen pushes his teams at LinkedIn to think beyond surface-level AI integration: "You can have massive lifts in your product outcomes if you properly enhance your infrastructure. How many product people talk about the infrastructure they have? Not many." AI-first PMs make infrastructure decisions because they understand these technical choices determine what experiences become possible.

Prompts are the New PRDs

Product requirements documents are getting a makeover. Instead of detailed specifications about buttons and workflows, you're writing prompts that define AI behaviour.

A traditional PRD might specify: "When a user clicks the search button, display results ranked by relevance with filters for date, category, and price." An AI-first PRD reads more like: "Help users find exactly what they're looking for by understanding their intent, even when they can't articulate it clearly. Surface the most useful results first, considering context from their previous searches and current project."

At Microsoft, this shift is explicit. Product leaders describe "prompt sets as the new PRDs." Prototyping with AI becomes essential for effective product development rather than optional exploration.

Writing effective prompts requires new skills. You need to be precise about desired outcomes while leaving room for the AI to adapt. Too specific, and you limit the system's intelligence. Too vague, and you get unpredictable results.

The best prompt writers develop intuition for how language models interpret instructions. At Glean, CPO Tamar Yehoshua built prompts that aggregate information across multiple tools: "I wrote a prompt to help me get the status of features. It looks at our Launch Cal, sees if there are any open JIRA tickets, what the Slack conversations are, and brings these together to tell me launch date and confidence level."

This approach treats prompts as conversation starters with your AI system. You're not commanding specific actions. You're establishing collaborative relationships where the AI understands your goals and helps achieve them.

Role prompting matters more than most PMs realise. Simply starting with "You are a product manager at Glean" before asking for analysis dramatically improves output quality. The AI understands context and constraints better when you define its perspective clearly.

Think of prompts as ingredients rather than recipes. As Cohen explains: "There's a realization that you don't control the experience anymore, you control the ingredients. It's almost like being a chef at a restaurant. This new technology says, just give me the ingredients, give me the guidelines, and now I'll take care of it for you."

Go Beyond Traditional QA for Testing AI Features

Traditional testing assumes predictable outputs. Click button A, get result B. AI features don't work this way, and add-on AI implementations make testing even harder.

The fundamental problem with bolted-on AI features is that they can't be tested in context. When Zendesk added AI-powered ticket routing, it initially performed well in isolation. But when deployed alongside existing assignment rules, macro shortcuts, and team workflows, the AI made decisions that contradicted established processes. The testing framework was unable to detect these conflicts because it evaluated the AI feature independently of the integrated user experience.

Microsoft Word's early grammar suggestions suffered similar issues. The AI performed well on isolated sentences but struggled to understand the document's context, writing style, or user intent. It would suggest changing "gonna" to "going to" in creative writing, or recommending the use of passive voice in technical documentation where it was intentionally chosen.

AI-first product design tests complete user journeys rather than isolated features. Building robust testing frameworks requires new approaches that account for context, user intent, and system integration.

Start with evaluation criteria that matter to users. Is the AI's suggestion actually valuable for this specific workflow? Does it understand the broader context of what the user is trying to accomplish? When it doesn't know something, does it admit uncertainty rather than confidently provide wrong information?

Create diverse test cases that accurately reflect real-world complexity and variability. Users input typos, incomplete information, and requests that are unclear or nonsensical. Your testing needs to cover edge cases that would never appear in traditional software, especially those created when AI features interact with existing product functionality.

Implement continuous monitoring in production. AI behaviour drifts over time as models update or user patterns change. Add-on AI features are particularly vulnerable to drift because they can't adapt to changes in the core product. Set up systems to identify integration issues before users report them.

The most sophisticated AI testing involves human evaluators working in tandem with automated systems. Humans provide nuanced judgment about quality and appropriateness that automated tests can't capture, especially for evaluating how AI features fit into complete user workflows.

Human-AI Workflow Design

The future isn't humans versus AI. It's humans and AI working together, each contributing their strengths.

Designing effective human-AI workflows starts with understanding what each party does best. Humans excel at creative problem-solving, emotional intelligence, and handling novel situations. AI excels at processing large amounts of information, finding patterns, and performing consistent analysis.

The magic happens in the handoffs. When should the AI surface a decision to human review? How do you design interfaces that make AI suggestions helpful rather than overwhelming? What information does a human need to trust or override AI recommendations?

Consider GitHub Copilot's approach. The AI suggests code, but developers decide what to accept, modify, or reject. The workflow feels collaborative rather than automated. Developers stay in control while the AI accelerates their work.

NotebookLM demonstrates this partnership brilliantly. The product doesn't just generate content; it creates conversational audio that helps users understand complex material. As product lead, Raiza Martin explains: "Technology has been there. You have to shape it and bring it closer to people. What is the shape? If you keep going at it, you'll eventually land on something that when people look at it, they're like, 'Wow, I get it.'"

Good human-AI workflows create feedback loops. Human decisions should improve AI performance over time. AI insights should enhance human decision-making. The partnership gets stronger with use.

At Glean, this means designing chat interfaces that not only answer questions but also guide users toward more informed questions. "People understand search because they understand Google," Yehoshua explains. "But chat interfaces, people still don't really know how to use. We need to build guardrails to help suggest what could work and what won't work."

New Success Metrics for AI Products

Traditional product metrics fall short for AI-first features and completely mislead teams building AI add-ons.

The classic example is chatbot engagement metrics. Companies celebrated high message volumes and long conversation lengths, not realising these metrics often indicated user frustration rather than success. IBM Watson's customer service chatbots demonstrated impressive engagement statistics, but users complained about getting stuck in endless loops while trying to reach human agents.

Spotify's early AI DJ feature suffered from metric misalignment. Traditional engagement metrics indicated that users frequently skipped AI-generated playlists, which appeared to be a failure. But users were actually training the system by skipping songs they didn't like. The AI was learning, but the metrics made it look broken.

Add-on AI features create particularly deceptive metrics because they optimise for their specific function rather than overall user success. An AI-powered email sorting feature may achieve high accuracy rates, but it can also slow down users who struggle to find emails in unexpected folders.

AI-first products require fundamentally different success measurements. Track task completion efficiency rather than feature usage. How much time does your AI save users? How many steps does it eliminate from complex workflows? These metrics reflect real value creation rather than just interaction volume.

Measure accuracy and reliability over time, not just at launch. AI systems degrade as data changes or the number of edge cases increases. Monitor false positive and false negative rates for critical decisions, especially how these rates change as your product evolves.

User trust becomes crucial. Do people act on AI recommendations? How often do they override suggestions? Trust builds slowly and can disappear quickly with AI systems, particularly when the AI feels disconnected from core workflows.

Consider AI-specific engagement patterns that might look concerning through traditional lenses. Users might interact less frequently but accomplish more per session. Lower session counts indicate higher efficiency rather than reduced engagement.

Quality metrics matter more than quantity metrics. One excellent AI suggestion that solves a user's problem beats ten mediocre suggestions they ignore. This principle becomes particularly critical when evaluating whether AI features should be expanded or redesigned from scratch.

You have to have the belief that AI is not deterministic. Giving it a chance to learn and experience it for yourself ultimately becomes much better.

Responsible AI: Building Ethics Into Products

AI bias isn't just an engineering problem. It's a product design challenge that affects every user interaction.

Building responsible AI features starts with understanding your training data. What perspectives are represented? What groups might be underrepresented or misrepresented? These gaps will show up in your product's behaviour.

Design AI systems that fail gracefully. When the AI isn't confident about a recommendation, it should clearly communicate its uncertainty. Users need to understand the system's limitations to make good decisions.

Implement fairness checks throughout development. Test your AI with diverse user groups and use cases. Look for disparate impacts on different demographics. Address problems before they reach production.

Create transparency in AI decision-making. Users should understand why the system made specific recommendations, especially for high-stakes decisions. Explainable AI isn't just nice to have; it's essential for user trust.

Build feedback mechanisms that let users correct AI mistakes. These corrections should improve the system for everyone, not just the individual user who provided feedback.

Strategic Product Architecture Decisions

AI-first architecture requires different trade-offs than traditional software architecture.

Decide early whether to build, buy, or partner for core AI capabilities. Large language models require massive infrastructure investments. Most companies should use existing models through APIs rather than training their own.

Plan for data requirements from day one. AI systems are hungry for high-quality training data. Your product architecture needs to capture, clean, and organise data for continuous model improvement.

Consider latency requirements carefully. Some AI features can handle delays while users wait for processing to occur. Others need near-instant responses. Your architecture choices affect which experiences you can deliver.

Design for model versioning and updates. AI models improve frequently, but updates can change behaviour in unexpected ways. Build systems that can test new models safely before rolling them out to users.

Think about the computing costs early. AI inference can be expensive, especially for complex models. Design features that strike a balance between capability and cost efficiency.

At Shopify, architecture decisions reflect long-term thinking. Product leader Archie Abrams explains: "The technical architecture determines strategy in a technology company even more than the what and who we're building for. If you build the right technical how, that is incredibly valuable over the long term."

Replit demonstrates this principle in practice. CEO Amjad Masad describes their infrastructure: "We expose all of that infrastructure to the AI. There's almost like a new discipline called AI Computer interfaces. LLMs need interfaces that are quite different than humans."

Building Products Around Autonomous AI Agents

The next frontier involves AI agents that can complete multi-step tasks without constant human guidance.

Autonomous AI agents work differently from traditional features. Instead of responding to specific user inputs, they pursue goals over time. They make decisions, take actions, and adapt based on results.

Microsoft's Aparna Chennapragada defines three key characteristics of AI agents: autonomy (delegation of tasks), complexity (handling multi-step challenges), and natural interaction (conversing beyond simple chat).

Building agent-based products requires new UX paradigms. Users need to set goals and constraints rather than clicking through predetermined workflows. They need visibility into what the agent is doing and the ability to intervene when necessary.

Design clear boundaries for agent autonomy. What decisions can the agent make independently? What requires human approval? How do you prevent agents from taking actions users wouldn't want?

The key question becomes: does it make sense to build an AI agent within your company, or should you make the foundation model companies' agentic layer amazingly seamless in your existing product? As one PM explains: "Your job is to make that experience so seamless that it doesn't even feel like AI in the first place."

Implement robust error handling. Agents encounter situations they can't handle. They need graceful ways to ask for help or escalate to human operators.

Create accountability mechanisms. Users should understand the actions the agent took and the reasons behind them. This transparency fosters trust and enables users to learn how to work effectively with agents.

Natural Language Experience (NLX) Principles

As AI becomes more conversational, traditional UI patterns need to be updated. Natural Language Experience design focuses on making AI interactions feel natural and productive.

NLX is the new UX, requiring deliberate design principles for conversational interfaces. Microsoft refers to this shift as explicit: natural language becomes the primary interface paradigm, rather than a supplementary feature.

Design conversations, not just interfaces. Users should feel like they're collaborating with an intelligent assistant rather than filling out forms or clicking through menus. The interaction model shifts from "operate the machine" to "communicate with a partner."

Handle ambiguity gracefully. Natural language is often imprecise. Good NLX design clarifies user intent rather than forcing users to be unnaturally specific. Ask follow-up questions when needed.

Maintain conversation context across interactions. Users shouldn't need to repeat information the AI already knows. Context management becomes crucial for natural-feeling experiences.

At Glean, Yehoshua tackles this challenge directly: "Chat interfaces, people still don't really know how to use. We need guardrails to help suggest what could work. How do you give people guardrails so they understand what is going to work and what isn't?"

Design for different communication styles. Some users prefer direct commands. Others like collaborative discussions. Your NLX should adapt to individual preferences and communication patterns.

Provide multiple interaction modes. Voice, text, and traditional UI elements can work together. Users should be able to switch between modes seamlessly based on their current context and preferences.

The interface for AI differs from that of chatbots. It might seem like you're optimising or speeding up a part of the process that humans currently perform. 

Leveraging Proprietary Data for Competitive Advantage

Your unique data becomes your AI moat, but only if your product architecture can actually use it. While competitors can access the same foundation models, they can't replicate your proprietary dataset or the insights it enables.

Add-on AI features often can't access the data they need to be genuinely useful. When Adobe added AI-powered search to Creative Cloud, it initially could only analyse file names and metadata, not actual creative content. Users got frustrated searching for "blue logo designs" and finding files named "logo_final_v3.psd" that contained red graphics. The AI lived outside the creative workflow and couldn't see what users actually created.

Compare this to how Figma built AI into its design tool. Their AI can analyse actual design elements, understand component relationships, and suggest improvements based on real usage patterns across their platform. Because intelligence was built into the core architecture from the start, the AI has access to rich behavioural data that creates genuine competitive advantages.

Netflix learned this lesson the hard way. Their early recommendation add-ons utilised viewing history but couldn't comprehend the viewing context. The AI would recommend horror movies because someone watched one scary film, not understanding they were trying to get their toddler to sleep and accidentally clicked the wrong title. When Netflix redesigned recommendations as an AI-first system, it could factor in time of day, device type, who was watching, and content completion patterns.

Identify what data advantages you have or can build. Customer interaction histories, domain-specific content, workflow patterns, and behavioural sequences all provide training material for specialised AI features. But these advantages only matter if your product architecture can connect AI capabilities to data sources.

Design data collection into your product experience from day one. Every user interaction can generate valuable training data, but the collection needs to feel natural and valuable to users rather than extractive. Add-on AI features often feel extractive because they ask for additional data without providing immediate value in return.

Build systems for data quality management before you need them. AI models amplify data problems exponentially. Invest in cleaning, labelling, and organising your data before using it for training. This becomes especially critical when AI features need to work with data from existing product areas that weren't designed with AI in mind.

Consider data flywheel effects when designing product architecture. Better data leads to better AI features, which attract more users, who generate more data. Design products that create these self-reinforcing cycles rather than AI features that exist in isolation from your core value proposition.

Protect your data advantages through technical architecture and legal frameworks. Competitors will try to replicate your success, but they can't replicate the integrated data flows that AI-first products create. This architectural advantage often proves more defensible than the AI models themselves.

Your differentiator must be something that will remain consistent as the LLMs improve. Your entire product improves as the LLMs become more advanced.

Technical Decision-Making for PMs

AI-first product management requires deeper technical understanding than traditional PM roles. You don't need to code, but you need to understand technical constraints and trade-offs.

Learn model performance characteristics. Different AI models excel at different tasks. GPT models handle text well but struggle with precise calculations. Computer vision models work well for image recognition, but require a substantial amount of training data.

Understand infrastructure requirements. AI features often need different compute resources than traditional software. GPU availability, memory requirements, and processing time all affect what experiences you can deliver.

Grasp the fundamentals of model training and fine-tuning. You'll need to make decisions about when to retrain models, what data to use, and how to measure improvement. 

Learn about AI safety and alignment techniques. These technical approaches help ensure AI systems behave as intended. Understanding the options helps you make better product decisions.

Stay current with AI research developments. The field moves quickly, and new capabilities can unlock entirely new product possibilities. OpenAI's real-time API, for example, suddenly made voice-based AI experiences practical for many companies.

PMs are now writing more code than ever. One PM with 25 years of experience notes: "I write more code now than in the past 10 years. The three skills that matter most for PMs in 2025: curiosity, humility, and agency."

Tools like Cursor and Replit make prototyping accessible. 

There are lots of tools around. It's extremely beneficial for a product manager to arrive with a functional prototype. The buttons work, and it's wired up reasonably correctly to tell a story.

Building Safe AI Systems

AI safety isn't just about preventing dramatic failures. It's about building systems that behave predictably and maintain user trust.

Design AI systems with multiple layers of safety checks. No single safeguard is perfect, but a layered approach can catch problems that individual checks may miss.

Implement monitoring systems that detect when AI behaviour drifts from expectations. Models can develop new failure modes as they encounter novel situations in production.

Create clear escalation paths when AI systems encounter situations they can't handle safely. Users should always have access to human assistance when needed.

Test AI systems adversarially. Try to deliberately break your AI features. Red team your own products to find vulnerabilities before users do.

Plan for worst-case scenarios. What happens if your AI system makes a serious mistake? Having incident response procedures ready can minimise damage and maintain user trust.

At LinkedIn, Cohen learned through failure: "We've failed a lot, but we learned so much along the way." This willingness to learn from mistakes becomes essential for building safe AI systems.

Managing AI Product Risks

Every AI feature introduces new categories of risk alongside traditional product risks.

Reputational risk from AI mistakes can spread quickly through social media. One viral example of biased or inappropriate AI behaviour can damage your brand for months.

Legal and regulatory risks are evolving rapidly. AI governance frameworks are emerging in different jurisdictions with different requirements. Stay informed about compliance obligations.

Technical debt accumulates differently in AI systems. Models become outdated, training data grows stale, and performance degrades over time—plan for ongoing maintenance and updates.

Dependency risks increase when using third-party AI services. Model providers can change APIs, adjust pricing, or shut down services. Build fallback plans for critical AI dependencies.

User expectation management becomes crucial. Overpromising AI capabilities leads to disappointment and negative feedback. Set appropriate expectations about what your AI can and cannot do.

Non-deterministic systems require different thinking. As Yehoshua explains: "Enterprise CIOs expect their software to be deterministic. How do you help educate users about that? You need to make sure your product gets better as the LLMs get better."

The Path Forward

AI-first product management isn't just about adopting new technologies; it's also about embracing a new mindset. It's about reimagining how products create value for users.

The companies that succeed in this transition will be those that embrace AI as a core capability rather than a peripheral feature. They'll invest in new skills, new processes, and new ways of measuring success.

This transformation won't happen overnight. Start by identifying one area where AI can meaningfully improve your product. Build competency gradually rather than trying to revolutionise everything at once.

Track metrics on how many prototypes teams build, rather than just focusing on business outcomes. Encouraging people to be hands-on removes the perception that this technology is inaccessible. You need a metric of how many shots you are taking in the first place?

Three flavours of AI product management are emerging. AI platform PMs build tools for AI engineers. AI product PMs work on products where AI is the core of the user experience. AI-powered PMs use AI to enhance existing products and workflows.

The counterintuitive truth: engineers are moving so fast with AI that PMs are now the bottleneck. Companies that recognise this shift and empower PMs to prototype, experiment, and iterate quickly will win.

The question isn't whether AI will transform product management; it's whether it will transform product management. It's whether you'll lead that transformation or scramble to catch up. Start building your AI-first skills today. The future belongs to products that feel intelligent rather than just functional.