For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).
cp_async_bulk_tensor_reduce_global_shared_cta
cp_async_bulk_tensor_reduce_global_shared_cta[src_type: AnyType, rank: Int, /, *, reduction_kind: ReduceOp, eviction_policy: CacheEviction = CacheEviction.EVICT_NORMAL](src_mem: UnsafePointer[src_type, address_space=AddressSpace.SHARED], tma_descriptor: UnsafePointer[NoneType], coords: IndexList[rank])
Initiates an asynchronous reduction operation between shared CTA memory and global memory using NVIDIA's Tensor Memory Access (TMA) mechanism.
This function performs an in-place reduction operation, combining data from shared memory with data in global memory using the specified reduction operation. The operation is performed asynchronously and uses TMA's tile mode for efficient memory access.
Notes:
- This operation is asynchronous - use appropriate memory barriers to ensure completion.
- Only supports rank-1 and rank-2 tensors.
- Requires NVIDIA GPU with TMA support.
- The source memory must be properly aligned for TMA operations.
- The TMA descriptor must be properly initialized before use.
- The reduction operation is performed atomically to ensure correctness.
Parameters:
- src_type (
AnyType): The data type of the source tensor elements. - rank (
Int): The dimensionality of the tensor (must be 1 or 2). - reduction_kind (
ReduceOp): The dtype of reduction operation to perform. Supported operations are: "add", "min", "max", "inc", "dec", "and", "or", "xor". - eviction_policy (
CacheEviction): Optional cache eviction policy that controls how the data is handled in the cache hierarchy. Defaults toEVICT_NORMAL.
Args:
- src_mem (
UnsafePointer[src_type, address_space=AddressSpace.SHARED]): Pointer to the source data in shared memory that will be reduced with the global memory data. Must be properly aligned according to TMA requirements. - tma_descriptor (
UnsafePointer[NoneType]): Pointer to the TMA descriptor containing metadata about tensor layout and memory access patterns. - coords (
IndexList[rank]): Coordinates specifying which tile of the tensor to operate on. For rank-1 tensors, this is a single coordinate. For rank-2 tensors, this contains both row and column coordinates.