functional
Implements higher-order functions.
You can import these APIs from the algorithm package. For example:
from std.algorithm import map
comptime values
stencil
comptime stencil[shape_element_type: DType, input_shape_element_type: DType, //, rank: Int, stencil_rank: Int, stencil_axis: IndexList[stencil_rank, element_type=element_type], simd_width: Int, dtype: DType, map_fn: def[.element_type`1: DType](IndexList[stencil_rank, element_type=element_type]) -> Tuple[IndexList[stencil_rank], IndexList[stencil_rank]], map_strides: def(dim: Int) -> Int, load_fn: def[simd_width: Int, dtype: DType, .element_type`4: DType](IndexList[rank, element_type=element_type]) -> SIMD[dtype, simd_width], compute_init_fn: def[simd_width: Int]() -> SIMD[dtype, simd_width], compute_fn: def[simd_width: Int, .element_type`7: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width], SIMD[dtype, simd_width]) -> SIMD[dtype, simd_width], compute_finalize_fn: def[simd_width: Int, .element_type`9: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width]) -> None] = fn_literal
Computes stencil operation in parallel.
Computes output as a function that processes input stencils, stencils are computed as a continuous region for each output point that is determined by map_fn : map_fn(y) -> lower_bound, upper_bound. The boundary conditions for regions that fail out of the input domain are handled by load_fn.
Args: shape: The shape of the output buffer. input_shape: The shape of the input buffer. map_fn_closure: Closure mapping output points to input co-domain bounds. map_strides_closure: Closure returning the stride for a given dimension. load_fn_closure: Closure loading a SIMD vector from input. compute_init_fn_closure: Closure initializing the stencil accumulator. compute_fn_closure: Closure processing each stencil point. compute_finalize_fn_closure: Closure finalizing the output value.
Parameters
- shape_element_type (
DType): The element dtype of the shape. - input_shape_element_type (
DType): The element dtype of the input shape. - rank (
Int): Input and output domain rank. - stencil_rank (
Int): Rank of stencil subdomain slice. - stencil_axis (
IndexList): Stencil subdomain axes. - simd_width (
Int): The SIMD vector width to use. - dtype (
DType): The input and output data dtype. - map_fn (
def[.element_type`1: DType](IndexList[stencil_rank, element_type=element_type]) -> Tuple[IndexList[stencil_rank], IndexList[stencil_rank]]): A function that a point in the output domain to the input co-domain. - map_strides (
def(dim: Int) -> Int): A function that returns the stride for the dim. - load_fn (
def[simd_width: Int, dtype: DType, .element_type`4: DType](IndexList[rank, element_type=element_type]) -> SIMD[dtype, simd_width]): A function that loads a vector of simd_width from input. - compute_init_fn (
def[simd_width: Int]() -> SIMD[dtype, simd_width]): A function that initializes vector compute over the stencil. - compute_fn (
def[simd_width: Int, .element_type`7: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width], SIMD[dtype, simd_width]) -> SIMD[dtype, simd_width]): A function the process the value computed for each point in the stencil. - compute_finalize_fn (
def[simd_width: Int, .element_type`9: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width]) -> None): A function that finalizes the computation of a point in the output domain given a stencil.
stencil_gpu
comptime stencil_gpu[shape_element_type: DType, input_shape_element_type: DType, //, rank: Int, stencil_rank: Int, stencil_axis: IndexList[stencil_rank, element_type=element_type], simd_width: Int, dtype: DType, MapFnType: ImplicitlyCopyable & def[.element_type`1: DType](IndexList[stencil_rank, element_type=element_type]) register_passable -> Tuple[IndexList[stencil_rank], IndexList[stencil_rank]], MapStridesType: ImplicitlyCopyable & def(dim: Int) register_passable -> Int, LoadFnType: ImplicitlyCopyable & def[simd_width: Int, dtype: DType, .element_type`4: DType](IndexList[rank, element_type=element_type]) register_passable -> SIMD[dtype, simd_width], ComputeInitFnType: ImplicitlyCopyable & def[simd_width: Int]() register_passable -> SIMD[dtype, simd_width], ComputeFnType: ImplicitlyCopyable & def[simd_width: Int, .element_type`7: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width], SIMD[dtype, simd_width]) register_passable -> SIMD[dtype, simd_width], ComputeFinalizeFnType: ImplicitlyCopyable & def[simd_width: Int, .element_type`9: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width]) register_passable -> None] = fn_literal
(Naive implementation) Computes stencil operation in parallel on GPU.
Args: ctx: The DeviceContext to use for GPU execution. shape: The shape of the output buffer. input_shape: The shape of the input buffer. map_func: Closure mapping output points to input co-domain bounds. map_strides_func: Closure returning the stride for a given dimension. load_func: Closure loading a SIMD vector from input. compute_init_func: Closure initializing the stencil accumulator. compute_func: Closure processing each stencil point. compute_finalize_func: Closure finalizing the output value.
Raises: If the GPU kernel launch fails.
Parameters
- shape_element_type (
DType): The element dtype of the shape. - input_shape_element_type (
DType): The element dtype of the input shape. - rank (
Int): Input and output domain rank. - stencil_rank (
Int): Rank of stencil subdomain slice. - stencil_axis (
IndexList): Stencil subdomain axes. - simd_width (
Int): The SIMD vector width to use. - dtype (
DType): The input and output data dtype. - MapFnType (
ImplicitlyCopyable&def[.element_type1: DType](IndexList[stencil_rank, element_type=element_type]) register_passable -> Tuple[IndexList[stencil_rank], IndexList[stencil_rank]]`): A closure maps a point in the output domain to input co-domain bounds. - MapStridesType (
ImplicitlyCopyable&def(dim: Int) register_passable -> Int): A closure returns the stride for each dimension. - LoadFnType (
ImplicitlyCopyable&def[simd_width: Int, dtype: DType, .element_type4: DType](IndexList[rank, element_type=element_type]) register_passable -> SIMD[dtype, simd_width]`): A closure loads a SIMD vector from input. - ComputeInitFnType (
ImplicitlyCopyable&def[simd_width: Int]() register_passable -> SIMD[dtype, simd_width]): A closure initializes the stencil accumulator. - ComputeFnType (
ImplicitlyCopyable&def[simd_width: Int, .element_type7: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width], SIMD[dtype, simd_width]) register_passable -> SIMD[dtype, simd_width]`): A closure processes the value computed for each stencil point. - ComputeFinalizeFnType (
ImplicitlyCopyable&def[simd_width: Int, .element_type9: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width]) register_passable -> None`): A closure finalizes the output value from the stencil result.
Functions
-
cpu_func_unified: -
elementwise: Executesfunc[width, rank](indices), possibly as sub-tasks, for a suitable combination of width and indices so as to cover shape. Returns when all sub-tasks have completed. -
func_unified: -
gpu_func_unified: