Big data is inherently difficult to manage because—as the name implies—there’s a lot of data. But when it comes to big data associated with video, it’s even more of a challenge because you’re dealing with non-uniform data.
Many people don’t think about how vast the streaming ecosystem is. Say, there’s an app playing a video. This app could be running on 15 different types of devices, with what could be five different versions, from 20 or more streaming or social platforms. Every single stream could behave totally differently, so when you look at the data needed to analyze streaming, it’s incredibly massive.
From internet service provider (ISP) or ad load issues to device and device operating system (OS) versions to user behavior like fast forwarding or rewinding, each single video session has multitudes of different data associated with it. And when you look at this at scale, as Conviva does, we are measuring millions of sessions and trillions of events each day across any device or platform where video can play.
There’s also a time component to data collection that further complicates managing video big data. Ads, for example, if the time of a pre-roll ad isn’t reported correctly, it could be reported as mid-roll ad, which is not only inaccurate, but could also directly impact advertising investment and revenue.
Data quality, completeness, and timeliness are additional challenges, and for Conviva, the data needs to not only be accurate, but also available in a matter of minutes—sometimes seconds. In this way, the data we collect and the way we process it is actually a very complicated computation, not just managing big data.
Conviva does this differently from others who compute video big data in that we monitor and record every single session. We don’t drop or skip strange occurrences in sessions. For instance, a single-CPU device can stream and run apps, but the memory is so low that the device will often buffer and create an anomaly in the data. While others may just drop or skip this anomaly, we don’t and that’s a big difference in how we manage the data.
We also use time series data. Our Stream Sensor collects heartbeats every 40 seconds from every device, and we do incremental computation every 30 seconds. It’s impossible to have all the data for a full stream at once, so every 30 seconds, we reingest, reprocess, and rereport all the data from the past 30 seconds.
But reporting is only as good as the data that is ingested. So, we have multiple validation checks throughout the process. An integration validation team ensures that each integration is correct and a data analytics team actually consumes the data so they can identify anomalies in real-time and isolate them if necessary.
With this way of managing the data, we help our customers isolate the right problem and right people experiencing the problem quickly. If there’s a quality of experience issue, for example, a publisher can drill down to the specific problem they need to fix in near real-time and accurately target only customers that were impacted.
The next phases of video big data quality management at Conviva include auto-collecting as much data and metadata as possible to lighten integrations and using AI to automatically detect if a device is reporting data incorrectly and adjust it on the fly.
As streaming increases globally, device types continue to fragment, and streaming platforms step up content creation and advertising, managing video big data quality will no doubt evolve, but we are dedicated to making it better every day.
Learn about another engineering challenge in our Analyzing Streaming Advertising Impacts using Play Head Time blog.