NVIDIA Technical BlogMay 13, 2025

Run Hugging Face Models Instantly with Day-0 Support from NVIDIA NeMo Framework

Introduction
Introducing AutoModel in NVIDIA NeMo Framework
How to use AutoModel
Adding a new AutoModel class in NeMo
Conclusion

Introduction

In the quest to maximize generative AI investments, access to the latest model developments is crucial. NVIDIA NeMo Framework leverages Megatron-Core and Transformer-Engine backends to achieve high throughput and Model Flops Utilization (MFU) on NVIDIA GPUs. To ensure Day-0 support for the latest models, NeMo framework introduces the Automatic Model (AutoModel) feature.

Introducing AutoModel in NVIDIA NeMo Framework

AutoModel is a high-level interface simplifying support for pretrained models within the NeMo framework. It enables seamless fine-tuning of any Hugging Face model for quick experimentation, with support for model parallelism, PyTorch JIT compilation, and optimized Megatron-Core support for select models.

How to use AutoModel

To use AutoModel in the NeMo framework for LoRA and Supervised Fine-tuning (SFT), follow these steps:

Instantiate an HF model
Specify LoRA target modules
Utilize Model class, Optimizer module, and Trainer Strategy instead of AutoModel for optimized performance in training and post-training processes.

Adding a new AutoModel class in NeMo

NeMo AutoModel currently supports text generation classes. To add support for other tasks, create a subclass similar to HFAutoModelForCausalLM, adapting the initializer, model configuration, training/validation steps, and save/load methods for your specific use case.

Conclusion