The Case for Experience-Centric Observability


Your business needs Experience-Centric Observability

While Observability and Application Performance Monitoring (APM) tools have transformed how engineering and operations teams function over the last ten years, there are several challenges that every VP of Ops and CTO still face with no clear solution in sight:

    • Being surprised by customer issues on social media, from customer support calls, or through the CEO, even when their tools say everything is fine.
    • Needing an army of expert engineers to debug whenever a major incident happens.
    • Exploding costs of observability tools to monitor their growing infrastructure.

These challenges are all rooted in the same underlying problem. Existing Observability tools and the teams that use them are disconnected from what really matters to the business: user experience and engagement.

Disconnect Between Operational Tools and Customer Experience

Operations need to evolve from focusing on low-level system performance to higher-level user experience. Experience-Centric Observability is a new paradigm that will transform operations and engineering teams to be more efficient, more connected to the business, and more cost effective. Keep reading to discover the first section of our three-part series on Experience-Centric Observability.

What is Experience-Centric Observability? 

Experience-Centric Observability is a shift in methodology where user experience is at the core of operations. System level monitoring is of course needed, but is not always kept in context of user experience ini real-time. By user experience, we are not talking about page load times and crashes on a small sample of users. We are talking about monitoring every user flow across every user in the application in a quantified manner in real-time. If a user is not able to login, sign up, or find content within an expected period of time, we should be alerted to the issue. If the percentage of successful sign ups suddenly dropped from 98% to 94% for whatever reason, we should know about it immediately—and not because we read complaints on social media.

Experience-Centric Observability circumvents these problems by intrinsically connecting a comprehensive measurement of user experience with system and application performance, removing surprise escalations, democratizing diagnostics by natively connecting the dots, and reducing cost by focusing on the data that matters.

Here is a simple real-world example. 

In the picture below, we are looking at the Conviva UI for a live sports app, specifically tracking an experience metric called Login Processing Time in the lead up to a big event. This metric measures the time taken to complete the login process after the user has entered their credentials and clicked login. It would include any SSO integration, third party authentication checks, and other activities to complete the login process and bring up a working application. It’s important to note that this is not as simple as measuring the response of a single API call.

While monitoring the metric, we see a sudden spike just before the event is about to start. Because we are directly measuring Login Processing Time, we immediately know that users are impacted and how many are impacted. With just two clicks, in the picture next page, we determine that only users on a specific iPhone 15 version and on two specific device models are experiencing the login problem. 

If not addressed immediately this would cause a major disruption for users attempting to watch the event causing churn and brand damage. Because we are directly measuring Login Processing Time, we immediately know that users are impacted and how many are impacted. With just two clicks, in the picture above, we determine that only users on a specific iPhone 15 version and on two specific device models are experiencing the login problem.

With two more clicks, in the picture below, we pinpoint a slow call to a third party authentication service. In this case, no amount of meticulous backend monitoring would have helped us locate root-cause, as it is a third-party issue. With user-centric operational monitoring, however, we easily connected the dots from Login Processing Time to the specific device models to the specific network call, leaving us with a clear understanding of the issue and its impact, and a concrete action to resolve the issue.

A new operational methodology

Experience-Centric Observability is a new way of thinking and does take some getting used to. As with all paradigm shifts, there must be a significant benefit. The figure below illustrates this difference and the massive benefit. On the left is the current paradigm with an infrastructure centric approach. Monitoring is primarily from backend systems through logs, metrics, and traces. Ops teams monitor these regularly, but surprises still happen and many experience issues still go under the radar. When an escalation comes from customers or social media, it triggers a frantic search for potential issues. There is no understanding of the magnitude or impact of the issue so no clear guidance on priority. This means that in many cases the ops team and an army of expert engineers across multiple areas of the system are brought in to diagnose the issue. Eventually, when an issue is found and fixed, the team is not sure if the customer problem is resolved or not since there is no direct monitoring of customer experience. This approach leads to more customer impact and higher cost.

The right side of the figure is how things work in an Experience-Centric approach. There is an equal emphasis on measurement of user experience and system performance. When user experience is impacted, the issue is immediately and automatically detected and then diagnosed by correlating with system performance, pinpointing the component or components causing the issue. This means the issue can be resolved quickly and with just a few team members. The team only has to look at the data that matters, which reduces cost. As the monitoring system learns patterns of system performance that impact user experience, it can start to predict experience impacting issues before they happen and classify system performance issues as customer impacting or not customer impacting to aid prioritization and investment. 

A new paradigm of technology

Unfortunately, existing observability tools are not capable of solving the disconnect between experience and performance or enabling an experience-centric approach to operations. They cannot measure experience metrics like Login Processing Time continuously in real-time and cannot connect them to performance causes. A new paradigm of technology is needed to unlock Experience-Centric Observability. This technology must support a flexible and easy way to compute experience metrics across all users in real-time, automatically alert on these metrics based on anomalies, and connect them to performance in the client and in the backend to quickly diagnose them. Each of these actions is individually very valuable, but together they enable a transformation that dramatically improves user experience and reduces operational cost.

 

Read full eBook

Read blog 2 in this series

Read blog 3 in this series