Version: Nightly

For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

match_all

def match_all[dtype: DType, //, mask_type: DType = DType.uint32 if (_resolve_warp_size() <= Int(32)) else DType.uint64](value: Scalar[dtype]) -> Scalar[mask_type]

Returns the warp's active-lane mask if all lanes share value, else 0.

When every active lane holds the same bits as the calling lane, returns the mask of those lanes (so a non-zero result is the "all agree" predicate that NVIDIA's match.all.sync also exposes); otherwise returns 0. The comparison is on the bits, so 0.0 and -0.0 are treated as different. This is the dual of match_any: it reports warp-wide agreement on a key.

All WARP_SIZE lanes must reach the call converged.

Example:

from std.gpu.primitives.warp import match_all

# `agreed` is non-zero (the active-lane mask) iff every lane passed the
# same `key`.
var key = Int32(42)
var agreed = match_all(key)

Constraints:

Only NVIDIA, AMD, and Apple Silicon GPUs are supported. dtype must be a 32- or 64-bit type and mask_type must be DType.uint32 or DType.uint64 (NVIDIA returns a 32-bit mask, so mask_type must be DType.uint32 there).

Parameters:

dtype (DType): The element type of value (inferred from the argument).
mask_type (DType): The lane-mask return type, DType.uint32 or DType.uint64 (defaults to the type matching WARP_SIZE).

Args:

value (Scalar[dtype]): The calling lane's value to compare against the rest of the warp.

Returns:

Scalar[mask_type]: A mask_type lane mask of the active lanes when they all hold a bit-equal value, otherwise 0.