Netflix does not need any introduction. It has over 200 million subscribers. Last year Netflix users watched a collective hour of 6 billion per month. No surprise that the company is estimated to spend over $13 billion on content alone this year. All this means, Netflix has a mind-boggling amount of data that it needs to deal with every hour, writing more than 700 billion events every day. How do they do it? How does it assure that it is delivering high-quality big data analytics to make critical strategic decisions?
In this Video, Michelle Ufford shares how data engineering and the big data analytics teams do that. She explains Netflix’s analytics environment and the challenges and how data is used across different roles?
To begin with, the company has built its data ecosystem entirely on open source big data technologies, all in AWS Cloud. It uses tools and applications like Kafka, Spark for data processing, and Tableau for data visualizations. It has a big team of data engineers, analytics engineers, visualization engineers, business analysts, research analytics, and machine learning scientists working day and night, ensuring an error-free data streaming and delivery process.
Despite its incessant efforts, there are days when it faces the problem of insufficient/bad data, and there are more than one point of failures in the data flow which may, ultimately, lead to adverse business impacts. Ufford explains that the company’s approach to handling bad data is detecting and responding to it as quickly as possible instead of simply trying to prevent it. The aim is not to make wrong business decisions based on unfortunate/bad data. Therefore, stale data is more preferred over bad data.