NVIDIA Technical BlogJune 3, 2024

NVIDIA Collaborates with Hugging Face to Simplify Generative AI Model Deployments

NVIDIA Collaborates with Hugging Face

NVIDIA is collaborating with Hugging Face to simplify generative AI model deployments. By optimizing foundation models with NVIDIA NIM, enterprises can generate tokens faster, reduce costs, and improve end user experience.

NVIDIA NIM Features

NVIDIA NIM leverages TensorRT-LLM inference optimization engine, industry-standard APIs, and prebuilt containers to provide low-latency, high-throughput AI inference that scales with demand. It offers superior throughput, enabling enterprises to generate tokens up to 5x faster.

Token Processing for Higher Revenue

Token processing is a key performance metric for generative AI applications. Increased token throughput directly translates to higher revenue for enterprises. NIM helps achieve near-100% utilization with multiple concurrent requests, enabling text generation 3x faster.

Seamless Deployment with Hugging Face

NIM can be deployed on Hugging Face with just a few clicks, starting with Llama 3 8B and Llama 3 70B models. The dedicated NIM endpoint on Hugging Face automates the deployment process on preferred cloud service providers, making it quick and easy to start inference.

How to Deploy

Navigate to the Llama 3 8B or 70B model page on Hugging Face.
Click on 'Deploy' drop-down and select 'NVIDIA NIM Endpoints'.
Choose your preferred CSP instance type and start the deployment process.

Get Started Today

Deploy Llama 3 8B and 70B NIMs from Hugging Face to accelerate generative AI solutions, increase revenue, and reduce inference costs. Explore over 40 multimodal NIMs at ai.nvidia.com and prototype applications with free NVIDIA cloud credits.