Version: 1.0.0b2

For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

compute

GPU compute operations package - MMA and tensor core operations.

This package provides GPU tensor core and matrix multiplication operations:

mma: Unified warp matrix-multiply-accumulate (WMMA) operations
mma_util: Utility functions for loading/storing MMA operands
mma_operand_descriptor: Operand descriptor types for MMA
tensor_ops: Tensor core-based reductions and operations
arch/: Architecture-specific MMA implementations (internal)
- mma_nvidia: NVIDIA tensor cores (SM70-SM90)
- mma_nvidia_sm100: NVIDIA Blackwell (SM100)
- mma_amd: AMD Matrix Cores (CDNA2/3/4)
- mma_amd_rdna: AMD WMMA (RDNA3/4)
- tcgen05: 5th generation tensor core operations (Blackwell)

Usage

Import compute operations directly:

from std.gpu.compute import mma

# Usage: var result = mma.mma(a, b, c)

Architecture-specific implementations in arch/ are internal and should not be imported directly by user code.

Packages

arch: Architecture-specific MMA implementations.

Modules

mma: This module includes utilities for working with the warp-matrix-matrix-multiplication (wmma) instructions.
mma_operand_descriptor: Implements traits for abstracting MMA operand descriptors in GPU matrix operations.
mma_util: Matrix multiply accumulate (MMA) utilities for GPU tensor cores.
tensor_ops: This module provides tensor core operations and utilities for GPU computation.

Usage​

Packages​

Modules​

Usage

Packages

Modules