For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).
match_all
def match_all[dtype: DType, //, mask_type: DType = DType.uint32 if (_resolve_warp_size() <= Int(32)) else DType.uint64](value: Scalar[dtype]) -> Scalar[mask_type]
Returns the warp's active-lane mask if all lanes share value, else 0.
When every active lane holds the same bits as the calling lane, returns the
mask of those lanes (so a non-zero result is the "all agree" predicate that
NVIDIA's match.all.sync also exposes); otherwise returns 0. The comparison
is on the bits, so 0.0 and -0.0 are treated as different. This is the
dual of match_any: it reports warp-wide agreement on a key.
All WARP_SIZE lanes must reach the call converged.
Example:
from std.gpu.primitives.warp import match_all
# `agreed` is non-zero (the active-lane mask) iff every lane passed the
# same `key`.
var agreed = match_all(key)
Constraints:
Only NVIDIA, AMD, and Apple Silicon GPUs are supported. dtype must be
a 32- or 64-bit type and mask_type must be DType.uint32 or
DType.uint64 (NVIDIA returns a 32-bit mask, so mask_type must be
DType.uint32 there).
Parameters:
- dtype (
DType): The element type ofvalue(inferred from the argument). - mask_type (
DType): The lane-mask return type,DType.uint32orDType.uint64(defaults to the type matchingWARP_SIZE).
Args:
- value (
Scalar[dtype]): The calling lane's value to compare against the rest of the warp.
Returns:
Scalar[mask_type]: A mask_type lane mask of the active lanes when they all hold a
bit-equal value, otherwise 0.