Using Spark and Hive to process BigData at Conviva

Dilip Joseph

Conviva monitors and optimizes tens of millions of online video streams daily for premium video brands. Through Conviva Pulse, our online video dashboard, customers analyze how their online video is being consumed. For example, customers can in real-time identify the most popular videos being watched and adjust their advertising strategy. As another example, customer ops teams can in real-time detect problems degrading the experience of users watching a live basketball game (say, high buffering due to an overloaded CDN) and quickly take corrective action before the game ends. In addition to live monitoring, our customers can also analyze historical video trends - what were the most popular videos last week and how long were they watched on average?

Our customers also ask questions that require very deep and often ad-hoc analysis -- "Something seems to be wrong with my video delivery last week. Any idea what is going on?"  The video analysis team at Conviva digs through terabytes of data to provide detailed responses to such questions. This post describes how we use Hive and Spark to make this happen.

Subscribe to Conviva: Engineering Blog