Skip to main content
Version: Nightly

tma

NVIDIA Tensor Memory Accelerator (TMA) module.

Provides types and functions for working with NVIDIA's Tensor Memory Accelerator, which enables efficient asynchronous data movement between global and shared memory on GPUs with Hopper architecture and newer.

The TMA hardware provides hardware-accelerated multi-dimensional memory copies with features like swizzling for bank conflict avoidance, L2 cache promotion hints, and support for various data types and memory layouts.

Structs

Functions