AWS: Netflix Case Study
What is Cloud Computing?
Cloud computing is the practice of delivering resources including tools and applications like data storage, servers, databases, networking, and software through the internet rather than providing from our local server or personal computer.
Why Cloud Computing?
As a user, we always don’t have minimum hardware resources required like RAM, CPU, and Hard Disk to run our program on the top of the operating system. So to solve this issue, the term cloud computing comes into play.
Let us take an example: Suppose we have created social media web applications as a startup with a server having hardware i.e. 100GB hard disk and 8 GB RAM. After somedays suddenly our web application goes viral and millions of users start hitting the site but our server doesn’t have the capability to handle such huge traffics coming. In this case, our site goes down which creates a bad reputation among the users. The another reason is, as a startup, we don’t always have that much money to invest to buy real physical hardware.
So to handle this type of situation, we can take rent from the cloud, resources like RAM, CPU, HDD, etc. We only have to pay for the time we use the resources of the cloud.
One of the famous clouds in today’s market is the AWS Cloud provided by Amazon.
AWS (Amazon Web Services) is a comprehensive, evolving cloud computing platform provided by Amazon that includes a mixture of infrastructure as a service (IaaS), platform as a service (PaaS), and packaged software as a service (SaaS) offerings. AWS services can offer an organization tools such as compute power, database storage, and content delivery services.
AWS launched in 2006 from the internal infrastructure that Amazon.com built to handle its online retail operations. AWS was one of the first companies to introduce a pay-as-you-go cloud computing model that scales to provide users with computing, storage, etc as needed.
One of the companies which is benefited by AWS services to handle its traffic and expand its business is Netflix.
So let us discuss its case study.
Netflix is the world’s leading internet television network, with more than 100 million members in more than 190 countries enjoying 125 million hours of TV shows and movies each day. Netflix uses AWS for nearly all its computing and storage needs, including databases, analytics, recommendation engines, video transcoding, and more — hundreds of functions that in total use more than 100,000 server instances on AWS.
It is the world’s leading internet television network, with more than 100 million members worldwide enjoying 125 million hours of TV shows and movies each day, including original series, documentaries, and feature films. Members can watch as much as they want, anytime, anywhere, on nearly any Internet-connected screen.
“Amazon Kinesis Data Streams processes multiple terabytes of log data each day, yet events show up in our analytics in seconds. We can discover and respond to issues in real time, ensuring high availability and a great customer experience.”
John Bennett
Senior Software Engineer, Netflix
Application Monitoring on a Massive Scale
Netflix uses Amazon Web Services (AWS) for nearly all its computing and storage needs, including databases, analytics, recommendation engines, video transcoding, and more — hundreds of functions that in total use more than 100,000 server instances on AWS.
This results in an extremely complex and dynamic networking environment where applications are constantly communicating inside AWS and across the Internet. Monitoring and optimizing its network is critical for Netflix to continue improving customer experience, increasing efficiency, and reducing costs. In particular, Netflix needed a solution for ingesting, augmenting, and analyzing the multiple terabytes of data its network generates daily in the form of virtual private cloud (VPC) flow logs. This would enable Netflix to identify performance-improvement opportunities, such as identifying apps that are communicating across regions and collocating them. The company would also be able to increase uptime by quickly detecting and mitigating application downtime.
Each log record carries information about the communications between two IP addresses. However, in a dynamic environment like the one at Netflix, where an IP address can float between applications from day to day or even minute to minute, IP addresses alone don’t have much meaning. “The data sources we had before we took on this initiative were one sided,” says John Bennett, senior software engineer at Netflix. “We’d know an application was connecting to others, but we didn’t know both sides of the conversation and how to optimize those communications or the placement of the applications on the network.”
Netflix set out to establish a new data source that could give it more insight into communication among applications and regions by combining VPC flow logs with application metadata.
Improving Customer Experience with Real-Time Network Monitoring
Netflix’s Amazon Kinesis Data Streams-based solution has proven to be highly scalable, each day processing billions of traffic flows. Typically, about 1,000 Amazon Kinesis shards work in parallel to process the data stream. “Amazon Kinesis Data Streams processes multiple terabytes of log data each day, yet events show up in our analytics in seconds,” says Bennett. “We can discover and respond to issues in real time, ensuring high availability and a great customer experience.”
Netflix is now able to identify new ways to optimize its applications, whether that means moving an application from one region to another or changing to a more appropriate network protocol for a specific type of traffic. “Our solution built on Amazon Kinesis enables us to identify ways to increase efficiency, reduce costs, and improve resiliency for the best customer experience,” says Bennett.
Although a streaming data solution is not new to the IT industry, it is an innovation in the networking space. “Netflix is heavily invested in AWS in part because it abstracts the underlying network, so we don’t have to deal with switches and routers,” says Bennett. “We’re monitoring, analyzing, and optimizing at a higher level of the stack — in ways we would never even consider if we were running our own data centers.”
Benefits of AWS
- Processes and enriches multiple terabytes each day, representing billions of events, with sub-second response times for analytics queries
- Highly cost-efficient compared to competing solutions
- Freedom to experiment with system architecture to arrive at the most effective solution
- Data ingestion initiated with just a few simple API calls
- Highly elastic solution with close to 1,000 Amazon Kinesis shards working in parallel