Uber BlogApril 4, 2025

Uber’s Journey to Ray on Kubernetes: Ray Setup

Introduction
Ray and Kubernetes Integration
Architecture Overview
Setting Up Ray on Kubernetes
Customizing Ray Cluster
Scaling Ray on Kubernetes
Monitoring and Debugging Ray Clusters
Conclusion

1. Introduction

In this section, we will provide an overview of Uber's journey to implementing Ray on Kubernetes for orchestrating machine learning pipelines.

2. Ray and Kubernetes Integration

Explore the integration between Ray and Kubernetes, and how they complement each other in creating a scalable and efficient machine learning pipeline orchestration system.

3. Architecture Overview

An in-depth look at the architecture designed by Uber for leveraging Ray on Kubernetes, including the components, workflow, and interactions within the system.

4. Setting Up Ray on Kubernetes

Step-by-step guide on how Uber set up Ray on Kubernetes, including installation, configuration, and deployment considerations.

5. Customizing Ray Cluster

Details on how Uber customized the Ray cluster on Kubernetes to meet specific requirements and optimize performance for machine learning tasks.

6. Scaling Ray on Kubernetes

Insights into how Uber efficiently scaled the Ray clusters on Kubernetes to handle increasing workloads and demand for machine learning processing.

7. Monitoring and Debugging Ray Clusters

Learn about the tools and strategies Uber employed to monitor and debug Ray clusters running on Kubernetes, ensuring smooth operation and performance optimization.

8. Conclusion

Summarize the key takeaways from Uber's journey to Ray on Kubernetes, highlighting the benefits, challenges, and future prospects of using this cutting-edge architecture for machine learning pipelines orchestration.

Uber’s Journey to Ray on Kubernetes: Ray Setup

Uber’s Journey to Ray on Kubernetes: Ray Setup

Table of Contents

1. Introduction

2. Ray and Kubernetes Integration

3. Architecture Overview

4. Setting Up Ray on Kubernetes

5. Customizing Ray Cluster

6. Scaling Ray on Kubernetes

7. Monitoring and Debugging Ray Clusters

8. Conclusion