Version: Nightly

For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

dual_elementwise

def dual_elementwise[func_0: def[width: Int, alignment: Int = Int(1)](Coord[_]) capturing thin -> None, func_1: def[width: Int, alignment: Int = Int(1)](Coord[_]) capturing thin -> None, simd_width: Int, *, target: StringSlice[StaticConstantOrigin] = StringSlice("gpu"), _trace_description: StringSlice[StaticConstantOrigin] = StringSlice("dual_elementwise")](shape_0: Coord, shape_1: Coord, context: DeviceContext)

Executes two elementwise functions over their respective shapes in a single GPU kernel launch. Each thread processes elements from both shapes, fusing two independent elementwise passes into one.

Parameters:

func_0 (def[width: Int, alignment: Int = Int(1)](Coord[_]) capturing thin -> None): The first body function.
func_1 (def[width: Int, alignment: Int = Int(1)](Coord[_]) capturing thin -> None): The second body function.
simd_width (Int): The SIMD vector width to use.
target (StringSlice[StaticConstantOrigin]): The target to run on (must be GPU).
_trace_description (StringSlice[StaticConstantOrigin]): Description of the trace.

Args:

shape_0 (Coord): The shape for the first function.
shape_1 (Coord): The shape for the second function.
context (DeviceContext): The device context to use.

Raises:

If the operation fails.