Netflix Tech BlogJune 26, 2024

Enhancing Netflix Reliability with Service-Level Prioritized Load Shedding

Introduction
The Problem
Our Solution
Real-World Application and Results
Service-Level Prioritized Load Shedding
Generic CPU based load-shedding
Experiments with CPU based load-shedding

Introduction

Netflix implemented Service-Level Prioritized Load Shedding at the PlayAPI backend service to prioritize user-initiated requests over prefetch requests during peak traffic. This approach improved availability and user experience by shedding non-critical traffic first.

The Problem

PlayAPI faced challenges with spikes in pre-fetch traffic reducing availability for user-initiated requests and increased backend latency affecting both types of requests equally. A concurrency limiter was used to throttle requests, impacting overall availability.

Our Solution

A concurrency limiter was implemented within PlayAPI to prioritize user-initiated requests over prefetch requests without physically sharding the two types. By categorizing requests into priority buckets, the system sheds non-critical traffic first, ensuring critical user requests are always prioritized and served efficiently.

Real-World Application and Results

In production, the prioritized load shedding mechanism proved effective, with availability for user-initiated requests exceeding 99.4% during peak traffic while prefetch request availability dropped to 20%. By implementing this solution, Netflix improved overall system reliability and user experience.

Service-Level Prioritized Load Shedding

Services can map requests to priority buckets (e.g., DEGRADED for non-critical requests and BEST_EFFORT for critical requests) based on user impact. Prioritized load shedding helps maintain service availability by shedding non-essential traffic first, ensuring critical services are maintained during high load scenarios.

Generic CPU based load-shedding

Netflix services utilize CPU utilization for autoscaling, making it a natural measure for load shedding. By targeting a specific CPU utilization threshold for shedding requests, services can progressively shed non-critical traffic as load increases to maintain system performance and user experience.

Experiments with CPU based load-shedding

Experiments were conducted by sending high request volumes to a service targeting 45% CPU for scaling but configured to shed traffic after 60% CPU usage. The system shed non-critical traffic first and critical traffic after 80% CPU utilization, demonstrating the effectiveness of CPU-based load shedding.