NVIDIA Technical Blog

LLM Benchmarking: Fundamental Concepts

# LLM Benchmarking: Fundamental Concepts

## Introduction
- NVIDIA provides GenAI-Perf, an open-source benchmarking tool for evaluating LLM performance.
- Load testing and performance benchmarking are essential for assessing model deployment.

## Stages of LLM Application
1. Prompt
2. Queuing
3. Prefill
4. Generation

## AI Token and Context Length
- AI token is specific to LLMs and essential for inference performance metrics.
- Context length refers to the number of tokens used in each generation step.

## LLM Inference Metrics
- Common metrics include time to first token and intertoken latency.
- TTFT measurement includes queuing time, prefill time, and network latency.
- Metrics can vary based on benchmarking tools and methodologies.

## Events and Definitions
- Definitions of key events (e.g., latency of requests, start and end of benchmark).
- TPS calculation methods by GenAI-Perf and LLMPerf.

## TPS and Requests per Second
- TPS represents throughput in terms of output tokens per end-to-end latency.
- TPS per user accounts for individual user request throughput.
- RPS signifies the average number of successfully completed requests per second.