reduce
reduce[val_type: DType, simd_width: Int, //, shuffle: def[dtype: DType, simd_width: Int](val: SIMD[dtype, simd_width], offset: UInt32) -> SIMD[dtype, simd_width], func: def[dtype: DType, width: Int](SIMD[dtype, width], SIMD[dtype, width]) capturing -> SIMD[dtype, width]](val: SIMD[val_type, simd_width]) -> SIMD[val_type, simd_width]
Performs a generic warp-wide reduction operation using shuffle operations.
This is a convenience wrapper around lane_group_reduce that operates on the entire warp. It allows customizing both the shuffle operation and reduction function.
Example:
from std.gpu.primitives.warp import reduce, shuffle_down
# Compute warp-wide sum using shuffle down
@parameter
def add[dtype: DType, width: SIMDSize](x: SIMD[dtype, width], y: SIMD[dtype, width]) capturing -> SIMD[dtype, width]:
return x + y
val = SIMD[DType.float32, 4](2.0, 4.0, 6.0, 8.0)
result = reduce[shuffle_down, add](val)
Parameters:
- val_type (
DType): The data type of the SIMD elements (e.g. float32, int32). - simd_width (
Int): The number of elements in the SIMD vector. - shuffle (
def[dtype: DType, simd_width: Int](val: SIMD[dtype, simd_width], offset: UInt32) -> SIMD[dtype, simd_width]): A function that performs the warp shuffle operation. Takes a SIMD value and offset and returns the shuffled result. - func (
def[dtype: DType, width: Int](SIMD[dtype, width], SIMD[dtype, width]) capturing -> SIMD[dtype, width]): A binary function that combines two SIMD values during reduction. This defines the reduction operation (e.g. add, max, min).
Args:
- val (
SIMD): The SIMD value to reduce. Each lane contributes its value.
Returns:
SIMD: A SIMD value containing the reduction result broadcast to all lanes in the warp.