In the ever-evolving landscape of digital services, the role of real-time monitoring and incident response has become critical to maintaining healthy application experiences. Companies need monitoring tools to keep their systems running smoothly. But monitoring is just one piece of the puzzle. How teams get alerted to incidents and how quickly they identify root cause can make or break their digital experiences.
Conviva has seen that challenge firsthand, and that’s why we’ve developed AI-powered alerting to enable teams to detect, diagnose, and solve any experience issue faster.
But why has it been so difficult for organizations to build alerting systems to help them optimize their digital experiences in real time?
The Setup Conundrum
Most monitoring solutions allow for custom configurations to fit the needs of any organization. While customization can be helpful for administrators, it often comes at a price—complexity. Setting up these tools to fit your needs requires endless lines of code and might need administrators to be heavily involved in the set-up process. This is especially true for system performance monitoring, which tends to be more backend-systems-based and less intuitive.
The False Positives Problem
Another issue that plagues traditional monitoring services is the prevalence of false positives. These tools often monitor backend systems that already have redundancies in place. Since they’re configured to alert on systems that already have automated failover mechanisms to self-resolve, they can trigger alerts for incidents that don’t actually impact the end-user experience. If your monitoring team wastes time and resources investigating an incident triggered by a false positive, how might that impact your overall MTTR (Mean Time to Resolution) for system errors? And what kind of impact might that have if even just 5% of the alerts your team receives are due to false positives?
For example, many streaming companies have multi-CDN strategies whereby, if one CDN fails, another CDN will be used instead. This is all managed in real-time and in the backend. Receiving alerts on the failed CDN does not help NOC teams because they have no action to take – the multi-CDN redundancy algorithm already selected the CDN which is not failing, so there is no customer impact.
This also often occurs with advertising. Many ad servers will trigger alerts for failed ads while trying to insert multiple ads in a row. Some of our customers experience an average of 5 ad server fails before an ad is successfully inserted and delivered. So only alerting on the ad server itself does not indicate if an ad was successfully delivered or not. This is why successful ad delivery must be measured and alerted at the user level so teams will only be alerted if the user doesn’t see an ad, not if there’s a reporting issue with a backend system.
False alarms can lead to alert fatigue and distract teams from real issues. It decreases efficiency and keeps teams from focusing on the real issues impacting their user experience right now. Not every alert fired from a traditional monitoring system needs to be addressed instantly, because they often surface an issue with a backend system that doesn’t have any impact on the end user. If a backend issue with latency isn’t impacting user experience, is it essential to use resources to address it in real time?
Conviva AI Alerts: A New Approach
Conviva AI Alerts takes a fundamentally different approach to monitoring, focusing on user experience at the granular level. Unlike other services, Conviva measures the experience impact on every device and every viewer, ensuring that alerts are only triggered when users are genuinely affected.
Our approach ensures you’re only alerted to the issues relevant for your business, so you can take action before revenue is impacted. Conviva AI Alerts surface issues like a sudden increase in average network request duration, which can slow down login, sign up, or search, and directly impact customer experience and revenue.
Whether alerting on average network request duration, average screen load time, average page load time, or a sudden drop in active devices, the incident will escalate from Info, through Warning, to Critical level, depending on the scale and severity of the issue. It enables teams to respond properly to any issue, whether it’s a complete outage that needs immediate attention or a small issue slowing down backend processes.
Conviva AI Alerts eliminate the noise – so teams can focus on protecting the moments that matter most in their apps.
Conviva has designed our AI Alerting capability to be fully transparent and interpretable, allowing users to understand and even control it. Our observable AI approach give users full transparency on why an alert was triggered because you can drill down and see and validate the reason for the alert. We think that when users can observe the decision-making process, you’re more likely to trust it. Observable AI empowers our users with insights into AI behavior, enabling you to make more informed decisions about when and how to use our AI alerts across your operations teams.
One standout feature of Conviva AI Alerts is its ease of use. Admins need to configure just three severity levels: Info, Warning, and Critical. These levels are determined based on the minimum number of affected users and the percentage of cumulative impacted users. Conviva’s algorithms systematically scan thousands of combinations of metadata in real-time, so teams don’t need to manage alerts across endless data fields.
Conviva AI Alerts Drive Better Outcomes
Now, let’s get to the hard data. Conviva conducted a study of its AI Alerts and made a remarkable discovery. In a sample study, it was found that, on average, engagement retention was 12% lower among users affected by an incident or outage even seven days after the alert. In other words, 12% of users impacted by an issue never returned to the platform within a week. That’s why teams need to be able to react instantly to user-impacting issues. Any delays in response can result in a bad user experience that may turn them away from the service forever.
During that study, users unaffected by that error had a 7-day engagement retention rate of 74.6%. However, for users affected by a start failure error, only 41.9% returned to the platform within seven days after the alert. This indicates that a staggering 32.7% of users were lost due to the negative experience caused by the issue, that otherwise may have been retained.
The Challenge of Measuring User Churn
Understanding user churn due to bad experiences is complex. It’s rarely a single bad encounter that drives users away; it’s often a series of negative experiences that increase user frustration until they ultimately decide to quit.
That’s why Conviva emphasizes a top down approach: monitoring and measuring user experience and engagement as it relates to system performance issues. Analyzing system performance issues on their own do not give teams the comprehensive view of what users are actually experiencing. Focusing on user experience and engagement ensures you measure the full impact of any issue, and can prioritize the most important actions to protect the user.
Instant Actionability Reduces Churn And Protects Revenue
In a world where customer experience is king, Conviva AI Alerts emerge as a crucial tool for businesses. Users have an abundance of choices when it comes to their digital experiences. So, operations teams need that instant actionability to be able to compete in an increasingly competitive industry. By measuring customer experience and reacting swiftly with AI Alerts, companies can reduce MTTR and proactively prevent the next outage and avoid preventable customer churn.