shuffle_xor
shuffle_xor[dtype: DType, simd_width: Int, //](val: SIMD[dtype, simd_width], offset: UInt32) -> SIMD[dtype, simd_width]
Exchanges values between threads in a warp using a butterfly pattern.
Performs a butterfly exchange pattern where each thread swaps values with another thread whose lane ID differs by a bitwise XOR with the given offset. This creates a butterfly communication pattern useful for parallel reductions and scans.
Parameters:
- dtype (
DType): The data type of the SIMD elements (e.g. float32, int32). - simd_width (
Int): The number of elements in each SIMD vector.
Args:
- val (
SIMD): The SIMD value to be exchanged with another thread. - offset (
UInt32): The lane offset to XOR with the current thread's lane ID to determine the exchange partner. Common values are powers of 2 for butterfly patterns.
Returns:
SIMD: The SIMD value from the thread at lane (current_lane XOR offset).
shuffle_xor[dtype: DType, simd_width: Int, //](mask: UInt, val: SIMD[dtype, simd_width], offset: UInt32) -> SIMD[dtype, simd_width]
Exchanges values between threads in a warp using a butterfly pattern with masking.
Performs a butterfly exchange pattern where each thread swaps values with another thread whose lane ID differs by a bitwise XOR with the given offset. The mask parameter allows controlling which threads participate in the exchange.
Example:
from std.gpu.primitives.warp import shuffle_xor
# Exchange values between even-numbered threads 4 lanes apart
mask = 0xAAAAAAAA # Even threads only
var val = SIMD[DType.float32, 16](42.0) # Example value
result = shuffle_xor(mask, val, 4.0)
Parameters:
- dtype (
DType): The data type of the SIMD elements (e.g. float32, int32). - simd_width (
Int): The number of elements in each SIMD vector.
Args:
- mask (
UInt): A bit mask specifying which threads participate in the exchange. Only threads with their corresponding bit set in the mask will exchange values. - val (
SIMD): The SIMD value to be exchanged with another thread. - offset (
UInt32): The lane offset to XOR with the current thread's lane ID to determine the exchange partner. Common values are powers of 2 for butterfly patterns.
Returns:
SIMD: The SIMD value from the thread at lane (current_lane XOR offset) if both threads
are enabled by the mask, otherwise the original value is preserved.