For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).
dual_elementwise
def dual_elementwise[rank: Int, //, func_0: def[width: Int, rank: Int, alignment: Int = 1](IndexList[rank]) capturing -> None, func_1: def[width: Int, rank: Int, alignment: Int = 1](IndexList[rank]) capturing -> None, simd_width: Int, *, target: StringSlice[StaticConstantOrigin] = StringSlice("gpu"), _trace_description: StringSlice[StaticConstantOrigin] = StringSlice("dual_elementwise")](shape_0: IndexList[rank], shape_1: IndexList[rank], context: DeviceContext)
Executes two elementwise functions over their respective shapes in a single GPU kernel launch. Each thread processes elements from both shapes, fusing two independent elementwise passes into one.
Parameters:
- rank (
Int): The rank of the buffers. - func_0 (
def[width: Int, rank: Int, alignment: Int = 1](IndexList[rank]) capturing -> None): The first body function. - func_1 (
def[width: Int, rank: Int, alignment: Int = 1](IndexList[rank]) capturing -> None): The second body function. - simd_width (
Int): The SIMD vector width to use. - target (
StringSlice[StaticConstantOrigin]): The target to run on (must be GPU). - _trace_description (
StringSlice[StaticConstantOrigin]): Description of the trace.
Args:
- shape_0 (
IndexList[rank]): The shape for the first function. - shape_1 (
IndexList[rank]): The shape for the second function. - context (
DeviceContext): The device context to use.
Raises:
If the operation fails.