IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /docs/manual/basics.md). For the complete Mojo documentation index, see llms.txt.
Skip to main content
Version: 1.0.0b1
For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

syncwarp

syncwarp(mask: Int = -1)

Synchronizes threads within a warp using a barrier.

This function creates a synchronization point where threads in a warp must wait until all threads specified by the mask reach this point. On NVIDIA GPUs, it uses warp-level synchronization primitives. On AMD GPUs, this acts as a wave execution barrier. On Apple GPUs, this acts as a SIMDGROUP execution barrier. Lane masks are not supported, so the mask argument is ignored and all active lanes must reach this point.

Note:

  • On NVIDIA GPUs, this maps to the nvvm.bar.warp.sync intrinsic.
  • On AMD GPUs, this maps to the llvm.amdgcn.wave.barrier intrinsic.
  • On Apple GPUs, this provides execution synchronization only via a SIMDGROUP barrier with mem_none (no memory fence). Use barrier() for threadgroup memory ordering.
  • Threads not participating in the sync must still execute the instruction.

Args:

  • mask (Int): An integer bitmask specifying which lanes (threads) in the warp should be synchronized. Each bit corresponds to a lane, with bit i controlling lane i. A value of 1 means the lane participates in the sync, 0 means it does not. Default value of -1 (all bits set) synchronizes all lanes.