Introducing the Nemotron-H Reasoning Model Family: Throughput Gains Without Compromise

thumbnail
  • Introduction: The Nemotron-H Reasoning Model Family introduces hybrid architectures that can be post-trained effectively, offering significant advantages in throughput and context length compared to pure Transformer models.

  • Mastering math, code, and science reasoning: The first phase of fine-tuning focused on math, science, and coding domains, incorporating a 5:1 ratio of reasoning to non-reasoning samples in training data. The compact dataset improved generalization across tasks and control over reasoning mode switching.

  • Training for long contexts: The model was trained on synthetic sequences up to 256K tokens to support 128K-token contexts, achieving an 84% RULER score in non-reasoning mode compared to 46% for Llama-Nemotron, indicating substantial gains in handling long contexts.

  • Task-specific skill training: Specific skills were targeted by creating task-specific datasets with automatic verifiers, followed by broader fine-tuning using a general reward model. The final stage of GRPO training led to noticeable gains in output quality, especially in tool use and instruction adherence.

  • Results: The Nemotron-H-47B-Reasoning-128K model delivers accuracy comparable to or better than Llama-Nemotron Super 49B V1.0 and outperforms Qwen3 32B across math, coding, science, tool use, and dialogue benchmarks.