Why Digital Product Insights Need Automated High-Cardinality Cohort Analytics

Imagine you are the chief product officer of a popular retail app and have just launched a new version. Naturally, you are curious if, and how, the new version has impacted business-critical metrics such as conversion rates.

You probably go to your data dashboards to view conversion metrics, doing a side-by-side comparison on the conversion numbers on the old vs. new version. You make the call to double down on the new version rollout since it has a better outcome.

But two weeks later, you realize this was a massive mistake! The new version turned out to have worse conversions when fully rolled out.

Now you are left wondering, what did you miss? As the old saying goes, “Lies, damn lies, and statistics,” you got misled by your own data!

Sadly, you are not alone. You just fell prey to a classic statistical problem called the Simpson’s paradox. Here is a brief definition from Wikipedia: Simpson’s paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined.  

What really happened here? A shallow analysis looking at a single dimension of user cohorts (i.e., the set of users that share the same values on some metadata feature combinations such as app version, OS type, geolocation) on just the app version numbers gave us a misleading and wrong view of the world. This misleading view in turn led to bad business outcomes, with two weeks of bad conversion numbers that could have ideally been avoided!

Could you have averted this disaster with better tools?

If only you had been able to do more “fine-grained” analysis looking at both the OS type and the app version, you may have noticed that there was a more nuanced phenomenon happening here. Conversions did not actually improve with the new version; it’s just that coincidentally, the new version was more likely to be adopted by iOS users who are also likely to have better conversion in rates.  Formally speaking, you fell victim to a “confounding factor” in your analysis, the OS type in this case that had a hidden dependence on the conversion rates that our original “shallow” analysis was not able to capture.

Here is another 3D visualization (2D to 3D animation) illustrating the same issue. When viewed only through the 2D perspective of Conversion versus App Version, it seems clear that App Version B performs better than App Version A. However, when fitting a regression plane within a 3D space that includes the additional dimension “Platform,” the resulting plane reveals a completely opposite relationship—App Version B performs worse than App Version A across both Platforms (iOS and Android).

Why should you care?

Hopefully this simple example illustrates that shallow cohort analysis can have disastrous impacts on your business outcomes in many ways. In particular, missing out on fine-grained analysis can have significant impact on your critical business workflows:

False sense of well-being/false negatives: The lack of fine-grained analysis as shown in the above example can result in incorrect control decisions based on an imprecise view of the available data.

False non-actionable leads in troubleshooting issues in the field: Often, product developers are charged with debugging performance and experience issues in the field. Having only shallow visibility into the data can send them “barking up the wrong tree” in identifying and debugging issues. For example, the figure shows an extended example looking at a combination of more fields: geolocation, CDN, device, and version. In this case, we may be led to falsely believe the problem is with CDN=CDN B when the real problem is for a specific combination of device, version, and demographic region that got the update!

Missing these nuanced patterns might result in losing customers from a segment that silently churns or failing to seize an untapped market opportunity. The team could be lulled into complacency, believing the overall metric to be stable, when targeted remediation or innovation is urgently needed. Essentially, false negatives hide the truth of emerging trends by smoothing over the details that truly matter.

When insights are not actionable, remediation becomes guesswork.  As the example below shows, if instead of localizing the problem in a small, identifiable subgroup, we may escalate with the CDN. Deeper segmentation clarifies the “why” behind trends, pointing directly to actionable factors—whether that means updating device support, targeting specific customer segments, or adjusting regional strategies.

The problem is much worse and existing solutions can’t help

As the above examples convey, we need a fine-grained multi-dimensional view into the client cohorts to get a full picture of the business impact and performance in the field. We presented a very simplified few with only a handful of client attributes and client values. The real problem is much worse! In practice, we can have a million possible combinations or “high cardinality” of attribute value combinations that we need to evaluate in practice to get a full picture.

Unfortunately, state-of-the-art solutions suffer from the curse of cardinality and can only support “shallow dimension” analysis. By “shallow dimension,” we refer to the use of a few or isolated dimensions in the analysis. This is a well-known and classic pain point in traditional product analytics and observability solutions! As you add more dimensions to your analysis, the number of potential subgroups grows exponentially. For instance, with 7 dimensions, each having 10 possible values, you could be looking at up to 10 million unique combinations. Manually exploring this vast landscape of cohorts isn’t just impractical—it’s nearly impossible without overwhelming analysts with pivot tables and static reports.

In fact, the problem is much worse. So far, we only considered “static” client-side attributes. In practice, to get rich insights we need to consider “dynamic” or “on demand” client-side cohort definitions. For instance, we may want to define a custom pattern to check if a user has a sequence of two or more request timeouts before the conversion fails and use this as a feature flag to create dynamic cohorts based on stateful sequence patterns.

Introducing Conviva’s next-generation ODP (Operational Data Platform)

At Conviva, our Operational Data Platform (ODP) is engineered from the ground up to support the complexities of modern digital environments. Designed to handle stateful computations and efficient dimensional analysis, the ODP is the backbone of our deep cohort analysis capabilities.

Micro-Cohort Computation: 

  • Massive Scale: The ODP efficiently manages millions of potential subgroups, ensuring that every intersection of dimensions—be it device type, user segment, geographic region, or content category—is continuously monitored.
  • Granularity and Precision: By automatically computing metrics across these fine-grained cohorts, the platform allows you to detect subtle shifts in behavior that might be completely invisible in aggregated reports.

Automated Insights: 

  • Subgroup Analysis Engine: Leveraging advanced indexing and storage architectures, our platform scans through the exponential space of possible subgroups, intelligently identifying anomalies and emerging trends.
  • Mathematical Rigor: The system effectively applies principles from subgroup analysis and statistical validation (such as mitigating Simpson’s paradox) to ensure that each insight is both statistically robust and operationally relevant.

Preventing Misinterpretation: 

  • Enhanced Clarity: With detailed breakdowns, you can quickly drill down into any aggregated trend to understand its underlying causes. This mitigates the risk of false positives by revealing how different segments behave differently.
  • Actionable Analytics: When an issue is detected, the platform provides a clear picture of the intersecting dimensions at play. This enables targeted interventions—whether it’s a software update for a particular device or a marketing strategy tailored to a specific demographic.

Actionable Results: 

  • Real-Time Adaptability: With the ability to process and analyze data in real time, Conviva’s ODP ensures that insights are both current and actionable, so your teams can respond swiftly to emerging challenges.

By combining stateful computation with automated deep-dive analytics, Conviva’s ODP transforms vast and complex datasets into clear, actionable insights. It empowers technical teams, developers, operations, and even product managers to make informed decisions that drive measurable business outcomes—ensuring that the depth of your analysis is never compromised by the breadth of your data.