For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).
syncwarp
syncwarp(mask: Int = -1)
Synchronizes threads within a warp using a barrier.
This function creates a synchronization point where threads in a warp must wait until all threads specified by the mask reach this point. On NVIDIA GPUs, it uses warp-level synchronization primitives. On AMD GPUs, this acts as a wave execution barrier. On Apple GPUs, this acts as a SIMDGROUP execution barrier. Lane masks are not supported, so the mask argument is ignored and all active lanes must reach this point.
Note:
- On NVIDIA GPUs, this maps to the nvvm.bar.warp.sync intrinsic.
- On AMD GPUs, this maps to the llvm.amdgcn.wave.barrier intrinsic.
- On Apple GPUs, this provides execution synchronization only via a SIMDGROUP
barrier with
mem_none(no memory fence). Usebarrier()for threadgroup memory ordering. - Threads not participating in the sync must still execute the instruction.
Args:
- mask (
Int): An integer bitmask specifying which lanes (threads) in the warp should be synchronized. Each bit corresponds to a lane, with bit i controlling lane i. A value of 1 means the lane participates in the sync, 0 means it does not. Default value of -1 (all bits set) synchronizes all lanes.