Version: Nightly

For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

match_any

def match_any[dtype: DType, //, mask_type: DType = DType.uint32 if (_resolve_warp_size() <= Int(32)) else DType.uint64](value: Scalar[dtype]) -> Scalar[mask_type]

Finds, for each lane, the mask of warp lanes whose value bits match it.

Returns a per-lane lane mask whose bit l is set for every active lane l whose value has the same bit pattern as the calling lane's. The comparison is on the bits (matching NVIDIA's match.any.sync), so 0.0 and -0.0 do not match while two NaNs with equal bits do. This is the fold a warp uses to coalesce same-keyed lanes (a histogram or scatter leader handling a whole group in one non-atomic update) instead of one atomic per lane.

All WARP_SIZE lanes must reach the call converged.

Example:

from std.gpu.primitives.warp import match_any

# If lanes 0, 3, 7 hold the same value, each of them gets a mask with
# bits 0, 3, and 7 set; the remaining lanes get their own groups.
var my_key = Int32(42)
var group = match_any(my_key)

Constraints:

Only NVIDIA, AMD, and Apple Silicon GPUs are supported. dtype must be a 32- or 64-bit type and mask_type must be DType.uint32 or DType.uint64 (NVIDIA returns a 32-bit mask, so mask_type must be DType.uint32 there).

Parameters:

dtype (DType): The element type of value (inferred from the argument).
mask_type (DType): The lane-mask return type, DType.uint32 or DType.uint64 (defaults to the type matching WARP_SIZE).

Args:

value (Scalar[dtype]): The calling lane's value to match against the rest of the warp.

Returns:

Scalar[mask_type]: A mask_type lane mask with bit l set for each active lane l holding a bit-equal value.