Chapter 2: Business Problem Identification and Alignment
"Can we use AI to predict customer churn?" The question seems straightforward until you start digging. What exactly counts as churn? A customer who hasn't purchased in 30 days? 90 days? Someone who unsubscribed from emails but still uses the product? And what would you do with churn predictions anyway? Send discount offers? Improve customer service? Change the product experience?
This is where most data science projects go wrong, not in the technical implementation but in the problem definition. Many experienced practitioners learned this lesson the hard way early in their careers: being overeager and not considering the business goals, context, or success metrics. The result was technically impressive models that created no business value.
The solution isn't to become a domain expert in every business function, though that helps. The solution is to develop a systematic approach to understanding business problems that data science can actually solve. This starts with proven frameworks for understanding intent and context, but it goes deeper into the unique challenges of aligning data science capabilities with business needs.
The first step is understanding the intent behind the request. This sounds obvious, but it's where most projects derail. Business stakeholders often come to you with solutions disguised as problems. "We need a recommendation engine" isn't a problem; it's a proposed solution. The actual problem might be that customers can't find relevant products, that average order values are declining, or that customer acquisition costs are rising.
Here are three critical questions that I used to find the intent behind the request:
- "What is the expected customer or business benefit?
- What's wrong with the way things are now?
- Why is solving this important now?"
These questions force you to move from solutions back to problems, converting technical possibilities to business realities.
But as a data science product manager, you should also ask additional questions that a traditional PM might skip. You need to understand not just what the business wants to achieve, but what they're willing to trade off to get there. Data science solutions often involve trade-offs that aren't immediately apparent.
Let’s revisit the case of an AI recommendation engine for sales intelligence. The business benefit seems clear: provide personalised recommendations to increase customer engagement and drive sales growth. But recommendation systems involve a fundamental tradeoff between recommendation relevance and user experience. Optimising for highly targeted recommendations may lead to overly narrow suggestions that create filter bubbles and reduce the discovery of new opportunities, while broader recommendations may feel less relevant and decrease engagement if you ignore the business context that goes beyond the initial problem statement - including the sales cycle complexity, customer segment diversity, and whether the primary goal is immediate conversion or long-term relationship building. With a $2M budget and 15% revenue growth target, understanding these tradeoffs becomes critical for determining success metrics, resource allocation, and the recommendation algorithm's core design principles.
The second step is understanding the context that shapes the solution space: building an AI integration into a sales intelligence tool requires a completely different approach than building the sales intelligence tool while waiting for the core engineering team's solution. The technical requirements might be similar, but the context determines whether you need an extensible, scalable solution or a quick, hacky workaround.
Context in data science product management includes several dimensions that traditional PM might not consider. Data context: What data do you have access to? How clean and complete is it? How much historical data is available? Are there privacy or regulatory constraints on how you can use it?
Technical context: What's your current infrastructure? Do you have real-time data pipelines or batch processing? Can you deploy models that require GPUs or are you limited to CPU-based solutions? What are your latency and throughput requirements?
Organisational context: Who are the stakeholders? How technically sophisticated are they? What's their tolerance for uncertainty and experimentation? How do they currently make decisions about the problem you're trying to solve?
Competitive context: Are you trying to match competitor capabilities or create new differentiation? How quickly do you need to move? What's the cost of being late versus the cost of being wrong?
This context analysis often reveals that the real problem is different from the stated problem. A request for "better personalization" might actually be a request for "help our merchandising team understand customer preferences" or "reduce the manual effort required to create targeted campaigns."
The third step is understanding what success looks like from multiple perspectives. Traditional PM often focuses on user metrics and business metrics. Data science PM requires you also to consider technical metrics and operational metrics.
User metrics for data science products are often more complex than traditional software metrics. A recommendation engine's success isn't just about click-through rates; it's about helping users discover products they wouldn't have found otherwise, balancing familiar items with novel suggestions, and adapting to changing preferences over time.
Business metrics need to account for the probabilistic nature of data science solutions. You can't promise that your churn prediction model will reduce churn by 15%. You can promise that it will identify customers at risk of churning with a certain accuracy, and that acting on those predictions has the potential to reduce churn by a certain amount.
Technical metrics matter because they affect long-term sustainability. A model that's 95% accurate but takes hours to retrain isn't sustainable if you need to adapt to changing conditions quickly. A solution that works well in development but requires manual intervention in production isn't really a solution.
Operational metrics matter because data science products often change how work gets done. Automating a manual process doesn't just save time; it changes who does what, what skills are required, and how errors are detected and corrected.
The fourth step is understanding the learning timeline and iteration strategy. Traditional PM often assumes you can build, measure, and learn quickly. Data science projects often have longer learning cycles that need to be planned up front.
You might need to collect data for months before you can train a meaningful model. You might need to run experiments for weeks to get statistically significant results. You might need to wait for seasonal patterns or business cycles to understand how your solution performs in different conditions.
This longer timeline affects how you communicate with stakeholders and how you structure the project. You can't promise quick wins in the same way, but you can promise learning milestones that build confidence and understanding over time.
The vulnerability principle becomes crucial here. Successful practitioners emphasise being "very open and honest with where you are" and focusing on learning rather than immediate revenue. In data science product management, this means acknowledging uncertainty upfront and setting expectations about what you'll learn and when.
This vulnerability isn't a weakness; it's strategic positioning. By being honest about uncertainty, you create space for experimentation and learning. By focusing on learning outcomes rather than just business outcomes, you build stakeholder confidence in your approach even when individual experiments don't work out.
The fifth step is identifying the minimum viable learning experiment. Traditional PM talks about minimum viable products. Data science PM often needs to think about minimum viable learning, the smallest experiment that can validate or invalidate your key assumptions.
This might be a simple analysis of existing data to understand whether the patterns you're looking for actually exist. It might be a manual process that simulates what an automated solution would do. It might be a simple rule-based system that establishes a baseline for more sophisticated approaches.
The goal isn't to build the final solution; it's to learn whether the final solution is worth building and what it would take to get there.
Consider the customer churn example. Before building a machine learning model, you might start by manually analysing customers who churned in the past six months. What patterns can you identify? How early could you have predicted the churn? What interventions might have prevented it? This analysis helps you understand whether churn prediction is technically feasible and business valuable before you invest in building automated solutions.
The sixth step is aligning with constraints that will guide the solution. Experienced practitioners emphasise that "clearly defined constraints free us to do anything except breach those constraints, empowering us to innovate." In data science product management, constraints often come from business requirements, technical limitations, and regulatory considerations.
Business constraints might include budget limitations, timeline requirements, or performance thresholds. If you can only afford to review 100 flagged transactions per day manually, that constrains how sensitive your fraud detection model can be.
Technical constraints might include latency requirements, infrastructure limitations, or data availability if you need to make recommendations in real-time but don't have real-time data pipelines, which constrain your solution approach.
Regulatory constraints might include privacy requirements, fairness considerations, or explainability needs. If you need to explain why a loan application was rejected, that constrains which types of models you can use.
These constraints aren't limitations; they're design parameters that help you focus on solutions that can actually be implemented and sustained.
The final step is establishing the feedback loop between learning and business value. Data science projects often involve significant upfront investment before you can deliver user-facing value. You need to structure the project so that learning compounds and builds confidence over time.
This might mean starting with internal tools that help business users understand the problem better before building customer-facing solutions. It might mean building simple automation that saves operational costs while you develop more sophisticated capabilities. It might mean creating dashboards that provide business insight even if they don't directly drive user actions.
The key is ensuring that every phase of the project delivers value that justifies continued investment, even if that value is learning rather than immediate business impact.
Business problem identification and alignment in data science product management isn't just about understanding what the business wants. It's about understanding what's possible, what's valuable, and what's sustainable. It's about turning vague business desires into specific learning experiments that build toward meaningful solutions.
When you get this right, you don't just solve the immediate problem; you build organisational capability to identify and solve similar problems in the future. You create a foundation for data-driven decision making that compounds over time.
The next chapter will show you how to turn this problem understanding into compelling product visions and strategic roadmaps that guide long-term development.