Chapter 21: Data and AI Products
You are a product manager at a growth stage software company that has integrated an artificial intelligence chatbot into its core platform. The initial launch generated a massive spike in signups and press mentions. Your engineers used a popular large language model API to build the feature in three weeks. You felt like you were winning the race against your competitors. Six months later the situation has changed. Your churn rate is increasing because users find the AI responses generic. Your competitors have launched identical features because they use the same underlying model. Your executive team is asking how you will defend your market position when the cost of building software is approaching zero. You realize that you have built a thin wrapper around another company's intelligence. You have no unique advantage. The tension lies between the speed of using off the shelf models and the necessity of building a durable moat. You face a world where every company has access to the same foundation models. You must either find a way to make your AI uniquely intelligent for your specific users or accept that your product is a commodity. You need to master the relationship between data and model performance to survive the next shift in the industry.
CORE SKILL OR PRINCIPLE
The core principle of building successful AI products is that data is the primary differentiator and the only sustainable moat. Foundation models are becoming a commodity that every competitor can access for the same price. You build a competitive advantage not by the model you choose but by the proprietary data you use to tune it. Product market fit in the AI era requires you to transition from a feature builder to a data architect who manages the context given to the machine. Success depends on your ability to facilitate a value exchange where user interactions generate better data which in turn generates better model performance. You must move from being a manager of software artifacts to a manager of model behavior through rigorous evaluation and fine tuning. This requires a shift in the product development lifecycle from deterministic planning to continuous calibration and behavior adjustment.
WHY GREAT AI PRODUCTS ARE ALL ABOUT THE DATA
Great AI products depend on data because models are limitless information eaters that improve with more context. A model really only knows what it was trained on or what you provide to it in the specific millisecond of a query. In traditional software you map out a decision engine or workflow where every input has a predictable output. With AI the system is non deterministic and requires massive amounts of data to calibrate its behavior. Your primary job as a product leader is to manage this data because the smarts of an AI agent are directly related to the specific context of your business.
Evidence suggests that 80 percent of the work done by AI engineers and product managers involves understanding workflows and data rather than building fancy models. You must be obsessed with your data and your users to create a unique experience. If you simply wire a pipeline to an LLM without well structured and timely data you will fail. Most people ignore the non determinism and the need for data quality which leads to horrendous results in production.
Data is the new moat because it allows you to solve the fat tail of human behavior that general models cannot predict. For example a document may have the same words in two different companies but possess a completely different meaning and importance based on internal context. You must capture this unique data from user interactions to build a system that understands these nuances. If you do not own the data you do not own the intelligence.
HOW TO BUILD YOUR OWN AI MOAT WITH FINE TUNING
You build an AI moat by using proprietary data to fine tune models for specific use cases. Fine tuning is the process of taking a foundational model and tailoring it with company specific data so that it performs better on your unique tasks. This is critical because most of the world's most valuable knowledge and processes are not public and thus are not in the training sets of general models.
To execute this you must identify the 20 percent of features that drive 80 percent of the willingness to pay and focus your fine tuning efforts there. You should aim to move from a general model to a specialized one that achieves a higher level of accuracy for your vertical. For example a legal AI tool like Harvey will likely always be more capable in its domain than a general model because it is fine tuned on legal data and workflows.
Moats are also built through data flywheels where you continue to maintain and generate proprietary data over time. You should use user feedback such as thumbs up or thumbs down signals to improve the next iteration of your prompt or fine tuning set. This creates a system where your product gets better as the models get better rather than being replaced by them. You must ensure your differentiator is something that remains even when the underlying foundational models become smarter.
WRITING EFFECTIVE EVALS AS A CRITICAL PM SKILL
Writing effective evaluations or evals is the most necessary skill for AI product builders today. Evals are a systematic way to measure and improve an AI application by looking at data and creating metrics around model performance. If the model is the product then the eval is the new product requirements document or PRD. You use evals to articulate what success looks like in a way that is useful for training and correcting the machine.
Effective evals start with error analysis which is the first step in conquering messy log data. You must put your product hat on and manually write notes on where the AI is failing. You should sample your data and look for traces where the model did something unexpected. Everyone who does this manual analysis learns more about their product than they could from any high level dashboard.
Once you have notes you can use an LLM to help you categorize them into failure modes or axial codes. These codes must be specific and actionable such as formatting error or conversational flow issue. You then build a rubric of what good looks like and use an LLM as a judge to automatically grade your application. This creates a feedback loop that lets you iterate on your product with confidence.
WHEN TO USE VIBES INSTEAD OF EVALS
You should use vibes or open ended testing when you are in the earliest stages of finding product market fit for a novel experience. For a completely new product form factor you may not yet know what good looks like or what the rubrics should be. In these cases you must throw stuff against the wall and try different prompts to see how the model behaves in a broad sense.
Vibes are useful for exploring the solution space and rebuilding your intuition for what is possible with a new technology. You should focus on the eyes light up moments during user testing where the AI does something magical that the user did not expect. This ad hoc style of testing helps you converge on a basic scaffold before you commit to the one time cost of building a robust eval suite.
However you must transition away from vibes as soon as you have identified the core use cases you want to work well. Relying only on vibes in the long term will make your systems unmaintainable and prevents you from measuring progress accurately. You can only hill climb on performance when you have a robust metric to measure it against.
THE ROLE OF SYNTHETIC DATA IN MODEL DEVELOPMENT
Synthetic data is data constructed by models for other models to learn from. It is an active research area that allows for rapid iteration because it is cheaper and more scalable than collecting data from humans. You can use synthetic data to teach specific behaviors like how to make comments on a document or how to reason through a multi step task.
Synthetic data is particularly useful when you have run out of internet text data or when you need to train on very specific edge cases. You can create a simulated jury or a group of sub agents to rate different model responses and predict which ones will be most helpful to real users. This allows you to teach the model its own values recursively and align it with your company's core principles.
You should still combine synthetic data with human expert data for high risk or highly specialized domains. Experts like doctors or lawyers are still needed to provide ground truth for tasks that are difficult for current models to teach themselves. The goal is to build a flywheel where human data seeds the model and synthetic data scales its capabilities.
BUILDING TRUST BETWEEN USERS AND AI MODELS
Building trust is the primary challenge for AI agents because they are often untrustworthy or produce results that feel alien to users. Trust is not a zero one state but something that builds over time as the model learns your preferences and becomes more personalized. Success requires you to give the user a feeling of being in the driver's seat even as the system becomes more autonomous.
You must navigate the agency and control tradeoff carefully. Every time you hand over decision making capabilities to an AI you relinquish some amount of control. You should start with high control and low agency and then slowly lean into more autonomy as the agent earns trust. For example a customer support agent should initially suggest answers for a human to review before it is allowed to reply directly to a customer.
Behavior calibration is key to ensuring you do not ruin the end user experience. You must minimize surprises by ensuring the AI behaves consistently with the data distribution patterns your users expect. Providing familiar form factors like reminders or notifications can help make general model capabilities feel more trustworthy and useful.
MODEL ENSEMBLES: USING SPECIALIZED MODELS TOGETHER
Successful AI products are often a society of models where different models perform different tasks based on their specific strengths. You should not try to make every use case fit into a single popular large language model because this often reduces the quality of the user experience. Instead you must find the right model for the right use case across your entire workflow.
For example you might use one model for coding tasks where it has a clear advantage and a different model for critiquing or summarizing text. You might use a smaller cheaper model for simple classification and a more powerful reasoning model for high level planning. This ensemble approach allows you to optimize for both performance and cost efficiency.
In a complex creation process like building a presentation you might use 20 different models to handle each step from generating an outline to selecting photorealistic images. You must coordinate these models through an orchestration layer that manages the state and control flow between them. This allows you to produce incredibly complex software quickly while ensuring the final output meets your high standards for quality.
SKILL APPLICATION
Apply these data and AI skills by integrating them into your weekly team rituals. Replace your standard status updates with demo Fridays where everyone shows working code or functional prototypes built with AI. This creates the space for your team to step out of their functional lanes and explore what is possible with new technology.
Operationalize your evaluation process by requiring a trace analysis in every product review. Do not accept high level metrics without seeing the specific failure cases and the open codes associated with them. This forces the team to stay close to the raw data and build a visceral understanding of the user pain.
Manage your AI roadmap using the seasons planning framework to adapt to rapid industry shifts. Ground your team on the current model capabilities and set loose quarterly goals that allow you to pivot when a new breakthrough occurs. This prevents you from being trapped in a long term plan that becomes obsolete within months.
Adopt the associate product builder model by teaching every team member to use AI for prototyping and data analysis. This reduces the coordination tax of waiting for a developer or a data scientist to unblock a project. It empowers each individual to do more per minute and increases the overall metabolic speed of your organization.
PRACTICE SELECTIVE MICROMANAGEMENT
Practice selective micromanagement by diving deep into the prompt engineering and data collection strategy for your most critical features. You cannot delegate the understanding of model behavior to someone who does not share your product taste and vision. You must be the chief tastemaker who ensures that the AI's output aligns with the meticulous craft your brand represents.
ACTION CHECKLIST
- Ask yourself if you would fully fund your team if you were the CEO today and write down the data that supports your answer.
- Schedule a trace analysis session this week to manually review 50 user logs and write open codes for every error you find.
- Identify your most active users and run a Sean Ellis survey to measure your current product market fit score.
- List the proprietary data sets you currently possess and identify one way to use them for fine tuning a specialized model.
- Create a rubric for what good looks like for your AI and draft an LLM as a judge prompt to automate your evaluations.
- Block out four hours on your calendar to play with three new AI tools you have never used before.
- Conduct a walk the store review of your AI feature onboarding flow on a mobile device and log every point of friction.
- Identify one task on your team that is currently managed via a spreadsheet and build a simple AI agent to automate it.
- Audit your roadmap and label every item as table stakes or differentiation to ensure you are building a unique moat.
- Set a personal SLA to respond to all engineering and data blockers within four hours to maintain team velocity.
- Replace your next status meeting with an asynchronous update in a shared document that includes a sentiment table.
- Draft a single sentence founding hypothesis for your next AI experiment using the target customer and urgent problem framework.
- Interview five customers who recently stopped using your product to find the struggling moment the AI failed to address.
- Establish a Slack channel for the team to share raw customer feedback and interesting AI use cases found in the wild.
- Create a two by two matrix for your pricing strategy based on attribution and autonomy.
- Ask your lead engineer to identify the most technically elegant part of your AI stack that users do not actually care about.
- Set an arbitrary deadline trap for an upcoming feature to force the team to cut scope and focus on the core value.
- Define who your product is not for to create a clear guardrail for your team's experimentation.
- Commit to dogfooding your own AI tools for at least two hours this week to calibrate your taste.
- Identify one piece of strategic technical debt you can take on today to ship a new data experiment faster.
- Commit to a six week execution cycle for your next major AI initiative and use the circuit breaker principle if it drags.