Version: 1.0.0b1

For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

functional

Implements higher-order functions.

You can import these APIs from the algorithm package. For example:

from std.algorithm import map

`comptime` values

`stencil`

comptime stencil[shape_element_type: DType, input_shape_element_type: DType, //, rank: Int, stencil_rank: Int, stencil_axis: IndexList[stencil_rank, element_type=element_type], simd_width: Int, dtype: DType, map_fn: def[.element_type`1: DType](IndexList[stencil_rank, element_type=element_type]) -> Tuple[IndexList[stencil_rank], IndexList[stencil_rank]], map_strides: def(dim: Int) -> Int, load_fn: def[simd_width: Int, dtype: DType, .element_type`4: DType](IndexList[rank, element_type=element_type]) -> SIMD[dtype, simd_width], compute_init_fn: def[simd_width: Int]() -> SIMD[dtype, simd_width], compute_fn: def[simd_width: Int, .element_type`7: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width], SIMD[dtype, simd_width]) -> SIMD[dtype, simd_width], compute_finalize_fn: def[simd_width: Int, .element_type`9: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width]) -> None] = fn_literal

Computes stencil operation in parallel.

Computes output as a function that processes input stencils, stencils are computed as a continuous region for each output point that is determined by map_fn : map_fn(y) -> lower_bound, upper_bound. The boundary conditions for regions that fail out of the input domain are handled by load_fn.

Args: shape: The shape of the output buffer. input_shape: The shape of the input buffer. map_fn_closure: Closure mapping output points to input co-domain bounds. map_strides_closure: Closure returning the stride for a given dimension. load_fn_closure: Closure loading a SIMD vector from input. compute_init_fn_closure: Closure initializing the stencil accumulator. compute_fn_closure: Closure processing each stencil point. compute_finalize_fn_closure: Closure finalizing the output value.

Parameters

shape_element_type (DType): The element dtype of the shape.
input_shape_element_type (DType): The element dtype of the input shape.
rank (Int): Input and output domain rank.
stencil_rank (Int): Rank of stencil subdomain slice.
stencil_axis (IndexList[stencil_rank, element_type=element_type]): Stencil subdomain axes.
simd_width (Int): The SIMD vector width to use.
dtype (DType): The input and output data dtype.
map_fn (def[.element_type`1: DType](IndexList[stencil_rank, element_type=element_type]) -> Tuple[IndexList[stencil_rank], IndexList[stencil_rank]]): A function that a point in the output domain to the input co-domain.
map_strides (def(dim: Int) -> Int): A function that returns the stride for the dim.
load_fn (def[simd_width: Int, dtype: DType, .element_type`4: DType](IndexList[rank, element_type=element_type]) -> SIMD[dtype, simd_width]): A function that loads a vector of simd_width from input.
compute_init_fn (def[simd_width: Int]() -> SIMD[dtype, simd_width]): A function that initializes vector compute over the stencil.
compute_fn (def[simd_width: Int, .element_type`7: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width], SIMD[dtype, simd_width]) -> SIMD[dtype, simd_width]): A function the process the value computed for each point in the stencil.
compute_finalize_fn (def[simd_width: Int, .element_type`9: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width]) -> None): A function that finalizes the computation of a point in the output domain given a stencil.

`stencil_gpu`

comptime stencil_gpu[shape_element_type: DType, input_shape_element_type: DType, //, rank: Int, stencil_rank: Int, stencil_axis: IndexList[stencil_rank, element_type=element_type], simd_width: Int, dtype: DType, MapFnType: ImplicitlyCopyable & def[.element_type`1: DType](IndexList[stencil_rank, element_type=element_type]) register_passable -> Tuple[IndexList[stencil_rank], IndexList[stencil_rank]], MapStridesType: ImplicitlyCopyable & def(dim: Int) register_passable -> Int, LoadFnType: ImplicitlyCopyable & def[simd_width: Int, dtype: DType, .element_type`4: DType](IndexList[rank, element_type=element_type]) register_passable -> SIMD[dtype, simd_width], ComputeInitFnType: ImplicitlyCopyable & def[simd_width: Int]() register_passable -> SIMD[dtype, simd_width], ComputeFnType: ImplicitlyCopyable & def[simd_width: Int, .element_type`7: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width], SIMD[dtype, simd_width]) register_passable -> SIMD[dtype, simd_width], ComputeFinalizeFnType: ImplicitlyCopyable & def[simd_width: Int, .element_type`9: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width]) register_passable -> None] = fn_literal

(Naive implementation) Computes stencil operation in parallel on GPU.

Args: ctx: The DeviceContext to use for GPU execution. shape: The shape of the output buffer. input_shape: The shape of the input buffer. map_func: Closure mapping output points to input co-domain bounds. map_strides_func: Closure returning the stride for a given dimension. load_func: Closure loading a SIMD vector from input. compute_init_func: Closure initializing the stencil accumulator. compute_func: Closure processing each stencil point. compute_finalize_func: Closure finalizing the output value.

Raises: If the GPU kernel launch fails.

Parameters

shape_element_type (DType): The element dtype of the shape.
input_shape_element_type (DType): The element dtype of the input shape.
rank (Int): Input and output domain rank.
stencil_rank (Int): Rank of stencil subdomain slice.
stencil_axis (IndexList[stencil_rank, element_type=element_type]): Stencil subdomain axes.
simd_width (Int): The SIMD vector width to use.
dtype (DType): The input and output data dtype.
MapFnType (ImplicitlyCopyable & def[.element_type1: DType](IndexList[stencil_rank, element_type=element_type]) register_passable -> Tuple[IndexList[stencil_rank], IndexList[stencil_rank]]`): A closure maps a point in the output domain to input co-domain bounds.
MapStridesType (ImplicitlyCopyable & def(dim: Int) register_passable -> Int): A closure returns the stride for each dimension.
LoadFnType (ImplicitlyCopyable & def[simd_width: Int, dtype: DType, .element_type4: DType](IndexList[rank, element_type=element_type]) register_passable -> SIMD[dtype, simd_width]`): A closure loads a SIMD vector from input.
ComputeInitFnType (ImplicitlyCopyable & def[simd_width: Int]() register_passable -> SIMD[dtype, simd_width]): A closure initializes the stencil accumulator.
ComputeFnType (ImplicitlyCopyable & def[simd_width: Int, .element_type7: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width], SIMD[dtype, simd_width]) register_passable -> SIMD[dtype, simd_width]`): A closure processes the value computed for each stencil point.
ComputeFinalizeFnType (ImplicitlyCopyable & def[simd_width: Int, .element_type9: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width]) register_passable -> None`): A closure finalizes the output value from the stencil result.

Functions

cpu_func_unified:
elementwise: Executes func[width, rank](indices), possibly as sub-tasks, for a suitable combination of width and indices so as to cover shape. Returns when all sub-tasks have completed.
func_unified:
gpu_func_unified:

comptime values​

stencil​

Parameters​

stencil_gpu​

Parameters​

Functions​

`comptime` values

`stencil`

Parameters

`stencil_gpu`

Parameters

Functions