Meta Engineering

OCP Summit 2024: The open future of networking hardware for AI

thumbnail

Table of Contents

  1. Introduction
  2. DSF: Scheduled fabric that is disaggregated and open
  3. Switches for next-generation 400G fabrics
  4. Evolving FBOSS and SAI for DSF
  5. FBNIC: A multi-host foundational NIC designed by Meta

Introduction

At Open Compute Project Summit (OCP) 2024, Meta is sharing details about the next-generation network fabric for AI training clusters. With a focus on openness and collaboration, Meta has expanded their network hardware portfolio to contribute new disaggregated network fabrics and a NIC to OCP.

DSF: Scheduled fabric that is disaggregated and open

Meta has developed a Disaggregated Scheduled Fabric (DSF) for their AI clusters, promoting open, vendor-agnostic systems with interchangeable building blocks. DSF extends disaggregation to VoQ-based switched systems powered by OCP-SAI standard and FBOSS, enhancing network performance for high-bandwidth AI clusters.

7700R4C-38PE: DSF Leaf Switch

  • Broadcom Jericho3-AI based
  • 18 x 800GE OSFP800 host ports
  • 20 x 800Gbps fabric ports
  • 14.4Tbps wirespeed performance

7720R4-128PE: DSF Spine Switch

  • Broadcom Ramon3 based
  • 128 x 800Gbps fabric ports
  • 102.4Tbps wirespeed performance
  • Accelerated compute optimized pipeline

Switches for next-generation 400G fabrics

Meta will deploy two 400G fabric switches, Minipack3 and Cisco 8501, which are backward compatible and support upgrades to 400G and 800G, with reduced power consumption per bit.

Evolving FBOSS and SAI for DSF

Meta continues to leverage OCP-SAI to integrate new network fabrics, switch hardware, and optical transceivers into FBOSS, fostering collaboration and innovation within the industry.

FBNIC: A multi-host foundational NIC designed by Meta

Meta has designed the ASIC for FBNIC, a foundational NIC supporting up to four hosts with complete datapath isolation for each host. The FBNIC driver has been upstreamed to the v6.11 kernel, aligning with Meta's vision of scalable, open, and collaborative AI hardware systems.