Chapter 4: Data as a Strategic Product Asset
Most companies treat data like exhaust from their business operations. Users click buttons, transactions get processed, events get logged, and data accumulates in databases like digital sediment. Then, when someone wants to build a data science product, they discover that their data is incomplete, inconsistent, and insufficient for the use cases they want to enable.
This backwards approach to data is why so many data science projects fail before they even begin. You can't build great data products on top of poor data foundations, no matter how sophisticated your algorithms or how talented your data scientists.
Successful data science product managers think about data differently. They treat data as a strategic product asset that needs to be designed, collected, and maintained with the same rigor as any other product component. They understand that data strategy isn't just about storage and processing; it's about creating the foundation for capabilities that don't exist yet.
This shift in thinking affects everything from how you design user experiences to how you structure your engineering teams. When you treat data as a product asset, you start asking different questions. Instead of "What data do we have?" you ask, "What data do we need to enable the experiences we want to create?" Instead of "How do we store this data?" you ask, "How do we structure this data to support the use cases we're planning?"
The first principle of treating data as a product asset is understanding that data collection is a product decision, not just a technical decision. Every piece of data you collect (or don't collect) affects what's possible in your data science products. Every schema decision, every tracking implementation, and every data retention policy shapes your future capabilities.
Consider user behaviour tracking. Most companies track basic events like page views, clicks, and purchases. But if you want to build sophisticated personalisation, you might need to track much more granular behaviour: how long users spend reading product descriptions, which images they zoom in on, how they navigate between categories, what they search for but don't find.
This granular tracking isn't just about collecting more data; it's about collecting the right data to support the experiences you want to create. But it also involves tradeoffs. More granular tracking means more complex data pipelines, higher storage costs, and more privacy considerations. As a product manager, you need to balance these tradeoffs based on your strategic priorities.
The second principle is that data quality is a product requirement, not just a technical requirement. Poor data quality doesn't just affect model accuracy; it affects user experience, business outcomes, and team productivity. A recommendation engine trained on incomplete purchase data will make poor recommendations. A fraud detection system trained on biased historical data will perpetuate that bias.
But data quality isn't binary. It's not about having perfect data; it's about having data that's good enough for your use cases. This requires understanding the relationship between data quality and product outcomes for your specific applications.
For some use cases, 90% data completeness might be sufficient. For others, you might need 99% completeness to achieve acceptable performance. For some applications, a slight bias in historical data might be acceptable if you can correct for it in your models. For others, bias might be completely unacceptable regardless of technical workarounds.
Industry experience shows this lesson clearly. Teams working on product categorisation discovered that the same products were categorised differently by different sellers, making it impossible to train accurate classification models. Instead of trying to work around the inconsistent data, they invested in cleaning and standardising the category structure first. This foundational work enabled much more sophisticated product discovery and recommendation capabilities later.
This example illustrates a crucial point: sometimes, the best data science product management decision is not to build data science products yet. If your data isn't ready to support the use cases you want to enable, you might need to invest in data infrastructure and quality improvement before you can build user-facing features.
The third principle is understanding the different types of data assets and how they compound over time. Not all data is created equal. Some data becomes more valuable as you collect more of it. Some data becomes less valuable over time. Some data is valuable on its own, while other data is only valuable in combination with other datasets.
User behaviour data often becomes more valuable as you collect more of it. The more you understand about how individual users behave, the better you can personalise their experience. But this data also has a shelf life. User preferences change over time, so very old behaviour data might actually hurt personalisation accuracy.
Product catalogue data has different characteristics. It's valuable immediately and doesn't necessarily become more valuable over time. But it needs to be kept current and consistent to remain useful. Outdated product information can make recommendation engines suggest products that are no longer available.
Transactional data often becomes more valuable in combination with other datasets. Purchase history alone might tell you what users bought, but combining it with behaviour data, demographic data, and product catalogue data can reveal much richer insights about user preferences and market trends.
Understanding these different data characteristics helps you prioritise data collection and quality improvement efforts. You might invest heavily in behaviour tracking systems that compound over time while accepting lower quality for data that has immediate but limited value.
The fourth principle is that data storytelling is a core product management skill. Raw data doesn't motivate stakeholders or drive decision-making. Data becomes powerful when it's transformed into stories that help people understand problems, opportunities, and solutions.
This storytelling isn't just about creating dashboards and reports. It's about helping stakeholders understand what the data means for their decisions and actions. A chart showing declining user engagement is just information. A story about how declining engagement correlates with specific product changes and what that means for future development decisions is actionable insight.
Successful practitioners emphasise the importance of "dogfooding" your own data products to understand how customers experience them. This principle applies to data storytelling as well. You need to experience your data stories from the perspective of different stakeholders to understand whether they're actually useful for decision-making.
A data story that makes sense to you might be confusing to a business stakeholder who doesn't understand the underlying metrics. A technical explanation that satisfies data scientists might not provide the context that executives need to make strategic decisions. Great data science product managers develop the ability to tell the same data story in different ways for different audiences.
The fifth principle is understanding data as a competitive asset. In many industries, the companies with the best data have sustainable competitive advantages that are difficult to replicate. This isn't just about having more data; it's about having better data, more relevant data, and more actionable data.
Netflix's recommendation engine isn't just better because they have smart algorithms. It's better because they have unique data about how users actually watch content, not just what they say they like. Amazon's demand forecasting isn't just sophisticated; it's trained on transaction data that competitors can't access. Google's search results aren't just relevant; they're informed by search behaviour data that no other company can replicate.
This competitive dimension affects how you think about data collection, data sharing, and data partnerships. Some data might be worth sharing to enable ecosystem benefits. Other data might be so strategically valuable that you need to protect it carefully.
The sixth principle is that data infrastructure is product infrastructure. The systems you build to collect, store, and process data aren't just technical foundations; they're product foundations that enable or constrain what you can build.
Real-time data pipelines enable real-time personalisation but require significant infrastructure investment. Batch processing systems are cheaper and simpler, but limit you to experiences that don't require immediate responsiveness. Data warehouses optimised for analytics might not support the low-latency queries required for user-facing features.
These infrastructure decisions affect user experience in ways that aren't always obvious. A recommendation engine that takes 500ms to generate suggestions might work fine on a website, but it feels slow in a mobile app. A personalisation system that can only update user profiles once per day might miss important behaviour changes that affect relevance.
As a data science product manager, you need to understand these infrastructure tradeoffs well enough to make informed decisions about where to invest and what constraints to accept.
The seventh principle is that data governance is product governance. Decisions about data privacy, data retention, data access, and data usage aren't just compliance requirements; they're product decisions that affect what experiences you can create and how you can create them.
Privacy regulations like GDPR don't just affect what data you can collect; they affect how you can use that data for personalisation, how long you can retain user behaviour history, and what explanations you need to provide for algorithmic decisions. These constraints shape your product possibilities in fundamental ways.
But governance constraints can also drive innovation. When you can't rely on extensive data collection, you might develop more efficient algorithms that work with less data. When you need to provide explanations for algorithmic decisions, you might build more interpretable models that actually improve user trust and satisfaction.
The final principle is that data assets require ongoing product management. Data doesn't just accumulate and remain useful forever. It degrades over time, becomes inconsistent, and loses relevance to changing business needs. Managing data as a product asset means treating data quality, data freshness, and data relevance as ongoing product requirements.
This ongoing management includes monitoring data quality metrics, updating data collection as product requirements change, and deprecating data that's no longer useful. It also includes evolving data schemas and data pipelines as you learn more about what data is actually valuable for your use cases.
Many companies make the mistake of treating data infrastructure as a one-time investment. They build data pipelines, set up data warehouses, and assume the foundation is complete. But data infrastructure needs to evolve with your product strategy, just like any other product component.
When you treat data as a strategic product asset, you don't just enable better data science products; you enable a different approach to product development. You can make decisions based on evidence rather than intuition. You can personalise experiences at scale. You can predict and prevent problems before they affect users.
Most importantly, you create a compounding advantage that gets stronger over time. Every user interaction generates data that improves your understanding. Every product improvement generates more data that enables better improvements. This virtuous cycle is what separates companies that use data science tactically from companies that use it strategically.
The next chapter will show you how to establish the governance and quality standards that make this strategic approach sustainable and compliant.