min
min[dtype: DType, width: Int, //, *, block_size: Int, broadcast: Bool = True](val: SIMD[dtype, width]) -> SIMD[dtype, width]
Computes the minimum value across all threads in a block.
Performs a parallel reduction using warp-level operations and shared memory to find the global minimum across all threads in the block.
Parameters:
- dtype (
DType): The data type of the SIMD elements. - width (
Int): The number of elements in each SIMD vector. - block_size (
Int): The total number of threads in the block. - broadcast (
Bool): If True, the final minimum is broadcast to all threads in the block. If False, only the first thread will have the complete min.
Args:
- val (
SIMD): The SIMD value to reduce. Each thread contributes its value to find the minimum.
Returns:
SIMD: If broadcast is True, each thread in the block will receive the minimum
value across the entire block. Otherwise, only the first thread will
have the complete result.
min[dtype: DType, width: Int, //, *, block_dim_x: Int, block_dim_y: Int, block_dim_z: Int = 1, broadcast: Bool = True](val: SIMD[dtype, width]) -> SIMD[dtype, width]
Computes the minimum value across all threads in a multi-dimensional block.
Performs a parallel reduction using warp-level operations and shared memory
to find the global minimum across all threads in the block. Thread IDs are
linearized in row-major order: x + y * dim_x + z * dim_x * dim_y.
Parameters:
- dtype (
DType): The data type of the SIMD elements. - width (
Int): The number of elements in each SIMD vector. - block_dim_x (
Int): The number of threads along the X dimension. - block_dim_y (
Int): The number of threads along the Y dimension. - block_dim_z (
Int): The number of threads along the Z dimension (default: 1). - broadcast (
Bool): If True, the final minimum is broadcast to all threads in the block. If False, only the first thread will have the complete min.
Args:
- val (
SIMD): The SIMD value to reduce. Each thread contributes its value to find the minimum.
Returns:
SIMD: If broadcast is True, each thread in the block will receive the minimum
value across the entire block. Otherwise, only the first thread will
have the complete result.