For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).
mma_apple
Apple Silicon MMA implementation for matrix multiply-accumulate operations.
This module provides two simdgroup_matrix MMA shapes:
- 16x16x16 (
_mma_apple): Apple M5 only (Metal 4.0 / AIR 2.8.0), float and integer-widening. - 8x8 (
_mma_apple_8x8): all Apple GPU generations (M1-M5), float-only.
Supported operations:
- Float multiply-accumulate (16x16): {F16, BF16, F32, E4M3, E5M2} inputs, F32 accumulator
- Integer widening multiply-accumulate (16x16): {I8, U8} inputs, I32/U32 accumulator
- Float multiply-accumulate (8x8): {F16, BF16, F32} inputs, F32 accumulator
Functions
-
apple_mma_load: Loads a 16x16 matrix fragment for the current simdgroup thread. -
apple_mma_load_8x8: Loads an 8x8 matrix fragment for the current simdgroup thread. -
apple_mma_store: Stores a 16x16 matrix fragment from the current simdgroup thread. -
apple_mma_store_8x8: Stores an 8x8 matrix fragment from the current simdgroup thread.