NVIDIA Technical Blog

Dynamic Loading in the CUDA Runtime

1. Introduction
   - New CUDA runtime APIs enable dynamic GPU device code loading, providing a context-independent way to select and load GPU device code.
   
2. Benefits of dynamic GPU device code loading
   - Explicit control over GPU device code loading, especially useful when the code is modified separately from the loading compilation unit.
   
3. Static loading in the CUDA runtime
   - The CUDA runtime maintains state of loaded GPU device code modules during initialization, based on what is compiled and linked with compilation tools.

4. Dynamic loading in the CUDA driver
   - Requires dynamic loading of GPU device code to execute and manage more state such as CUDA contexts.
   
5. Dynamic loading in the CUDA runtime
   - New changes in CUDA support dynamic loading in the runtime, providing flexibility in dynamically loading GPU device code.
   
6. Benefits 
   - Pure CUDA runtime API usage, interchangeability of types between CUDA driver and runtime, and handle sharing between runtime instances.
   
7. Sharing of CUDA kernel handles 
   - Allows sharing of CUDA kernels between different libraries by passing handles between libraries.
   
8. Get started with CUDA runtime dynamic loading
   - Introduction to new CUDA runtime APIs for loading and executing device code on the GPU using a simpler approach when only CUDA runtime API is needed.