For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).
cp_async_bulk_tensor_shared_cluster_global_elect
cp_async_bulk_tensor_shared_cluster_global_elect[dst_type: AnyType, mbr_type: AnyType, rank: Int, /, *, cta_group: Int = 1, eviction_policy: CacheEviction = CacheEviction.EVICT_NORMAL](dst_mem: UnsafePointer[dst_type, address_space=AddressSpace.SHARED], tma_descriptor: UnsafePointer[NoneType], mem_bar: UnsafePointer[mbr_type, address_space=AddressSpace.SHARED], coords: IndexList[rank], elect: Int32)
Elect-predicated variant of cp_async_bulk_tensor_shared_cluster_global.
Behaves exactly like cp_async_bulk_tensor_shared_cluster_global except
that the TMA instruction is guarded by a PTX predicate derived from
elect: the instruction is issued only when elect != 0. All lanes
follow the same PTX control flow, so there is no Mojo-level branch and
no warp-divergent if elect != 0: wrapper at the call site.
Parameters:
- dst_type (
AnyType): The data type of the destination memory. - mbr_type (
AnyType): The data type of the memory barrier. - rank (
Int): The dimensionality of the tensor (1, 2, 3, 4, or 5). - cta_group (
Int): The CTA group to use for the copy operation. Must be 1 or 2. - eviction_policy (
CacheEviction): Optional cache eviction policy. Defaults toEVICT_NORMAL.
Args:
- dst_mem (
UnsafePointer[dst_type, address_space=AddressSpace.SHARED]): Pointer to the destination in shared memory. - tma_descriptor (
UnsafePointer[NoneType]): Pointer to the TMA descriptor. - mem_bar (
UnsafePointer[mbr_type, address_space=AddressSpace.SHARED]): Pointer to the cluster-shared memory barrier. - coords (
IndexList[rank]): Coordinates specifying which tile to copy. - elect (
Int32):0on non-elected lanes (skip the TMA), non-zero on the single elected lane (issue the TMA). Typically theInt32returned byelect()fromnn.attention.gpu.nvidia.sm100.attention_utils.