For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).
match_any
def match_any[dtype: DType, //, mask_type: DType = DType.uint32 if (_resolve_warp_size() <= Int(32)) else DType.uint64](value: Scalar[dtype]) -> Scalar[mask_type]
Finds, for each lane, the mask of warp lanes whose value bits match it.
Returns a per-lane lane mask whose bit l is set for every active lane l
whose value has the same bit pattern as the calling lane's. The comparison
is on the bits (matching NVIDIA's match.any.sync), so 0.0 and -0.0 do
not match while two NaNs with equal bits do. This is the fold a warp uses
to coalesce same-keyed lanes (a histogram or scatter leader handling a whole
group in one non-atomic update) instead of one atomic per lane.
All WARP_SIZE lanes must reach the call converged.
Example:
from std.gpu.primitives.warp import match_any
# If lanes 0, 3, 7 hold the same value, each of them gets a mask with
# bits 0, 3, and 7 set; the remaining lanes get their own groups.
var group = match_any(my_key)
Constraints:
Only NVIDIA, AMD, and Apple Silicon GPUs are supported. dtype must be
a 32- or 64-bit type and mask_type must be DType.uint32 or
DType.uint64 (NVIDIA returns a 32-bit mask, so mask_type must be
DType.uint32 there).
Parameters:
- dtype (
DType): The element type ofvalue(inferred from the argument). - mask_type (
DType): The lane-mask return type,DType.uint32orDType.uint64(defaults to the type matchingWARP_SIZE).
Args:
- value (
Scalar[dtype]): The calling lane's value to match against the rest of the warp.
Returns:
Scalar[mask_type]: A mask_type lane mask with bit l set for each active lane l holding
a bit-equal value.