mma_amd
AMD CDNA Matrix Cores implementation for matrix multiply-accumulate operations.
This module provides MMA implementations for AMD CDNA2, CDNA3, and CDNA4 data center GPUs using the MFMA (Matrix Fused Multiply-Add) instructions.
Reference: https://gpuopen.com/learn/amd-lab-notes/amd-lab-notes-matrix-cores-readme/