NVIDIA Technical BlogJune 2, 2024

A Simple Guide to Deploying Generative AI with NVIDIA NIM

This section provides a quick overview of how to deploy NVIDIA NIM in under 5 minutes, using a single optimized container on accelerated NVIDIA GPU systems. It also mentions the option to prototype applications using NIM APIs from the NVIDIA API catalog and highlights support for models fine-tuned with techniques like LoRA.

Requirements and Deployment Process

Before getting started, ensure you have all the prerequisites and follow the requirements outlined in the NIM documentation. An NVIDIA AI Enterprise License is necessary, and sample NVIDIA-hosted deployments are available on the NVIDIA API catalog. The section also mentions the availability of a full guide for deploying NIM.

Integrating NIM with Your Applications

This part discusses how to integrate NIM with your applications, either by deploying it first or testing it using NVIDIA-hosted API endpoints in the API catalog. It mentions starting with a completions curl request following the OpenAI spec and provides an example of using NIMs in Python code with the OpenAI library.

Using NIM with Application Frameworks

NIM is integrated with popular generative AI application frameworks like Haystack, LangChain, and LlamaIndex. Developers can use NIM microservices APIs within these frameworks and check out notebooks from each framework to learn how to utilize NIM effectively.

Enhancing NIM Usage

The final section emphasizes leveraging the fast, reliable, and simple model deployment capabilities of NVIDIA NIM to build innovative generative AI workflows. It encourages learning how to use NIM microservices with LoRA adapters for customized LLMs and staying updated on the latest microservices available in the API catalog.