Netflix Tech BlogAugust 29, 2024

Recommending for Long-Term Member Satisfaction at Netflix

Introduction
Retention as Reward? Proxy Rewards
Click-through Rate (CTR)
Reward Engineering
Challenge: Delayed Feedback
Solution: Predict Missing Feedback
Bandit Policy Models
Summary and Open Questions

Introduction

At Netflix, our goal is to entertain the world by recommending content that not only engages members in the moment but also enhances their long-term satisfaction. This leads to increased member value and contributes to higher retention rates.

Retention as Reward? Proxy Rewards

Instead of using retention as a direct reward signal, we optimize a proxy reward function that reflects long-term member satisfaction while considering individual recommendations. This helps us align our recommendation strategy with overall member satisfaction.

Click-through Rate (CTR)

Click-through rate (CTR) is used as a simple proxy reward where a reward of 1 is given if a user interacts with a recommendation (e.g., plays a recommended show) and 0 otherwise. CTR serves as a basic indicator of user engagement with our recommendations.

Reward Engineering

Reward engineering involves refining the proxy reward function iteratively to better align with long-term member satisfaction. This process helps us optimize our recommendation algorithms for improved user experiences.

Challenge: Delayed Feedback

One challenge we face is the delay in receiving user feedback, as not all members provide immediate input on their viewing experiences. This delay can impact the accuracy of our reward signals and influence recommendation outcomes.

Solution: Predict Missing Feedback

To address delayed feedback, we predict missing user feedback using available data and relevant features. By incorporating predicted feedback into our reward functions, we can train our recommendation models more effectively and improve decision-making processes.

Bandit Policy Models

Bandit policy models are utilized to make real-time recommendations to users based on the defined reward functions. While offline models can help predict user behavior, the discrepancy between online and offline metrics highlights the importance of continuously refining our reward engineering strategies.

Summary and Open Questions

In summary, aligning Netflix recommendations with long-term member satisfaction involves defining proxy rewards that reflect user engagement and preferences. As we strive to optimize our recommendation systems, ongoing questions remain about effectively balancing short-term feedback with long-term user satisfaction.