Enhancing Netflix Reliability with Service-Level Prioritized Load Shedding

Table of Contents
- Introduction
- The Problem
- Our Solution
- Real-World Application and Results
- Service-Level Prioritized Load Shedding
- Generic CPU based load-shedding
- Experiments with CPU based load-shedding
Introduction
Netflix implemented Service-Level Prioritized Load Shedding at the PlayAPI backend service to prioritize user-initiated requests over prefetch requests during peak traffic. This approach improved availability and user experience by shedding non-critical traffic first.
The Problem
PlayAPI faced challenges with spikes in pre-fetch traffic reducing availability for user-initiated requests and increased backend latency affecting both types of requests equally. A concurrency limiter was used to throttle requests, impacting overall availability.
Our Solution
A concurrency limiter was implemented within PlayAPI to prioritize user-initiated requests over prefetch requests without physically sharding the two types. By categorizing requests into priority buckets, the system sheds non-critical traffic first, ensuring critical user requests are always prioritized and served efficiently.
Real-World Application and Results
In production, the prioritized load shedding mechanism proved effective, with availability for user-initiated requests exceeding 99.4% during peak traffic while prefetch request availability dropped to 20%. By implementing this solution, Netflix improved overall system reliability and user experience.
Service-Level Prioritized Load Shedding
Services can map requests to priority buckets (e.g., DEGRADED for non-critical requests and BEST_EFFORT for critical requests) based on user impact. Prioritized load shedding helps maintain service availability by shedding non-essential traffic first, ensuring critical services are maintained during high load scenarios.
Generic CPU based load-shedding
Netflix services utilize CPU utilization for autoscaling, making it a natural measure for load shedding. By targeting a specific CPU utilization threshold for shedding requests, services can progressively shed non-critical traffic as load increases to maintain system performance and user experience.
Experiments with CPU based load-shedding
Experiments were conducted by sending high request volumes to a service targeting 45% CPU for scaling but configured to shed traffic after 60% CPU usage. The system shed non-critical traffic first and critical traffic after 80% CPU utilization, demonstrating the effectiveness of CPU-based load shedding.