Overdependence on Data Warehouses


Love them or hate them — or maybe both — but there’s an industry-wide dependence on data warehouses. Most streaming enterprises depend on them too much when it comes to analytics.

Data warehouses are essentially an ops-to-executive data-infrastructure technology that dates back to the late 1980s. Technology was a little different back then. In 1989, 45Mpbs T3 was backbone-level throughput and Tim Berners-Lee was pitching the World Wide Web at CERN.

Fast forward to an era in which 50Mbps is more or less a standard household bandwidth. It might be time for some new analytics solutions.

Key takeaways:

    • It’s not worth it to relentlessly hack the data warehouse when there’s a purpose-built solution.
    • Different underlying logics make a huge difference in efficiency between data warehouses and streaming analytics.
    • Data warehouses remain essential — they’re just not the holy grail.
    • Contemporary analytics let organizations dig down into census-level QoE metrics with speed, scalability, and flexibility.

What Do You Expect of Your Data Warehouse?

There’s this concept floating around that data warehouses are some kind of cure-all for data security, analytics, and decision support. That’s the kind of thinking that leads to some of these staggeringly large budget allocations for big-data solutions.

Data warehouse architecture has its place in nearly every corporation. But where exactly is that place?

The answer: Data warehouses belong in the decision-making toolbox along with a variety of other specialized analytics tools. That way, executives will be able to make use of the most efficient tool for the job.

Streaming enterprises need a way to understand audience experience and inform business decisions. In a climate where the entire industry is cutting costs, organizations need multiple tools — and the right tools — to succeed.

The Data Warehouse: A Decades-Old Solution

Data warehouses are solutions from around 40 years ago. The technology has improved since then, but the fact remains that the basic underlying logic was not built to satisfy all of the core requirements of actionable analytics for a major VoD enterprise:

    • Real-time (or at least very low latency) observation
    • Complex operations
    • Census-level measurements
    • Continuous reporting
    • Contextual metrics

Data warehouse technology was undeniably built to support decision-making at the executive level. It continues to be popular because of its effectiveness in that role.

However, it doesn’t perform as well as some other options when answering certain key questions. It’s far from the holy grail of decision support — especially in the streaming and publishing arenas.

What’s Going on Behind the Scenes

To understand why organizations need a diverse approach to data and analytics, it’s important to understand what’s going on behind the scenes. The first thing to remember is that data warehouses were intended to have two primary functions:

    • Storing data
    • Supporting executive decisions

In order to become useful as a decision-making tool, the data has to pass through various logical operators that select relevant items for consideration. In other words, the logic needs to abstract underlying data for human use.

Those vast repositories of operators typically need extensive engineering resources both to build and maintain. The more complex the questions that decision-makers are asking, the bigger the library becomes. That, of course, leads to higher costs.

There’s also the question of abstraction. To get from raw data to useful insight, legacy architectures need to do a lot of work. They need to compute at query time.

Services like those Conviva offers operate differently. They’re purpose-built to analyze big-data streams. These differences significantly reduce computation costs and increase speed.

Streaming-Specific Questions

There are certain complex video QoE metrics that data warehouses fail to answer efficiently. A common example is buffering.

Clients experience buffering events for various reasons. Unfortunately, it’s usually not efficient to take information from a data warehouse and dig down into the types of metrics that could help uncover root causes.

How can an operation with millions of concurrent viewers determine how much time their audience spends buffering due to network issues on a single CDN? Executives need to ask complex questions like this to make informed decisions.

The exact question might change from one day to the next, which means issues pile up quickly. Addressing things in real-time helps you stay ahead of QoE optimization — and ahead of your competition.

Developing Answers

Organizations are operating at a disadvantage when they attempt to tease apart data into usable metrics using a data-warehouse architecture on its own. To calculate a single complex streaming QoE metric:

    • There needs to be visibility on potentially hundreds of questions
    • Engineering teams need to build and maintain huge libraries of operators
    • Additional work needs to go into connecting the information to the appropriate endpoints

The logic integral to a traditional data warehouse was not made to operate at scale, with low latency, and with that type of abstraction. To perform this type of task, the most efficient possible way would be a system that rolls up the raw data, performs calculations, and then filters down to the relevant segment.

There are other options, such as Apache Spark. Many even allow programming in familiar languages — but they offer an underlying structure built specifically for streaming and analytics. Leading publishers and third-party services depend heavily on these types of architectures for their complex, low-latency video streaming analytics.

Getting to Actionability

The goal should never be to know everything. It’s to get the right insight at the right time — the insight that lets an organization outperform the competition.

The fact is that data warehouses can answer many complex questions. After all, a properly configured system would have virtually everything there is to know about all streaming operations.

So what’s the problem? The issue is, that to be truly competitive, a streaming analytics system needs three things at once:

Speed: Any insight available with at least sub-60-second latency

Flexibility: Unlimited ability to provide key analytics

Scale: Oversight on every session simultaneously

That’s just not practical with a data warehouse. Luckily, there’s a no-compromises option available in the form of purpose-built streaming analytics.

Taking that First Step

The first step toward integrating any new technology is trying it out in person. Check out our tech page or click here for our live events and webinars.