Skip to main content
Version: 1.0

functional

Implements higher-order functions.

You can import these APIs from the algorithm package. For example:

from std.algorithm import map

comptime values

stencil

comptime stencil[shape_element_type: DType, input_shape_element_type: DType, //, rank: Int, stencil_rank: Int, stencil_axis: IndexList[stencil_rank, element_type=element_type], simd_width: Int, dtype: DType, map_fn: def[.element_type`1: DType](IndexList[stencil_rank, element_type=element_type]) -> Tuple[IndexList[stencil_rank], IndexList[stencil_rank]], map_strides: def(dim: Int) -> Int, load_fn: def[simd_width: Int, dtype: DType, .element_type`4: DType](IndexList[rank, element_type=element_type]) -> SIMD[dtype, simd_width], compute_init_fn: def[simd_width: Int]() -> SIMD[dtype, simd_width], compute_fn: def[simd_width: Int, .element_type`7: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width], SIMD[dtype, simd_width]) -> SIMD[dtype, simd_width], compute_finalize_fn: def[simd_width: Int, .element_type`9: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width]) -> None] = fn_literal

Computes stencil operation in parallel.

Computes output as a function that processes input stencils, stencils are computed as a continuous region for each output point that is determined by map_fn : map_fn(y) -> lower_bound, upper_bound. The boundary conditions for regions that fail out of the input domain are handled by load_fn.

Args: shape: The shape of the output buffer. input_shape: The shape of the input buffer. map_fn_closure: Closure mapping output points to input co-domain bounds. map_strides_closure: Closure returning the stride for a given dimension. load_fn_closure: Closure loading a SIMD vector from input. compute_init_fn_closure: Closure initializing the stencil accumulator. compute_fn_closure: Closure processing each stencil point. compute_finalize_fn_closure: Closure finalizing the output value.

Parameters

  • shape_element_type (DType): The element dtype of the shape.
  • input_shape_element_type (DType): The element dtype of the input shape.
  • rank (Int): Input and output domain rank.
  • stencil_rank (Int): Rank of stencil subdomain slice.
  • stencil_axis (IndexList): Stencil subdomain axes.
  • simd_width (Int): The SIMD vector width to use.
  • dtype (DType): The input and output data dtype.
  • map_fn (def[.element_type`1: DType](IndexList[stencil_rank, element_type=element_type]) -> Tuple[IndexList[stencil_rank], IndexList[stencil_rank]]): A function that a point in the output domain to the input co-domain.
  • map_strides (def(dim: Int) -> Int): A function that returns the stride for the dim.
  • load_fn (def[simd_width: Int, dtype: DType, .element_type`4: DType](IndexList[rank, element_type=element_type]) -> SIMD[dtype, simd_width]): A function that loads a vector of simd_width from input.
  • compute_init_fn (def[simd_width: Int]() -> SIMD[dtype, simd_width]): A function that initializes vector compute over the stencil.
  • compute_fn (def[simd_width: Int, .element_type`7: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width], SIMD[dtype, simd_width]) -> SIMD[dtype, simd_width]): A function the process the value computed for each point in the stencil.
  • compute_finalize_fn (def[simd_width: Int, .element_type`9: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width]) -> None): A function that finalizes the computation of a point in the output domain given a stencil.

stencil_gpu

comptime stencil_gpu[shape_element_type: DType, input_shape_element_type: DType, //, rank: Int, stencil_rank: Int, stencil_axis: IndexList[stencil_rank, element_type=element_type], simd_width: Int, dtype: DType, MapFnType: ImplicitlyCopyable & def[.element_type`1: DType](IndexList[stencil_rank, element_type=element_type]) register_passable -> Tuple[IndexList[stencil_rank], IndexList[stencil_rank]], MapStridesType: ImplicitlyCopyable & def(dim: Int) register_passable -> Int, LoadFnType: ImplicitlyCopyable & def[simd_width: Int, dtype: DType, .element_type`4: DType](IndexList[rank, element_type=element_type]) register_passable -> SIMD[dtype, simd_width], ComputeInitFnType: ImplicitlyCopyable & def[simd_width: Int]() register_passable -> SIMD[dtype, simd_width], ComputeFnType: ImplicitlyCopyable & def[simd_width: Int, .element_type`7: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width], SIMD[dtype, simd_width]) register_passable -> SIMD[dtype, simd_width], ComputeFinalizeFnType: ImplicitlyCopyable & def[simd_width: Int, .element_type`9: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width]) register_passable -> None] = fn_literal

(Naive implementation) Computes stencil operation in parallel on GPU.

Args: ctx: The DeviceContext to use for GPU execution. shape: The shape of the output buffer. input_shape: The shape of the input buffer. map_func: Closure mapping output points to input co-domain bounds. map_strides_func: Closure returning the stride for a given dimension. load_func: Closure loading a SIMD vector from input. compute_init_func: Closure initializing the stencil accumulator. compute_func: Closure processing each stencil point. compute_finalize_func: Closure finalizing the output value.

Raises: If the GPU kernel launch fails.

Parameters

  • shape_element_type (DType): The element dtype of the shape.
  • input_shape_element_type (DType): The element dtype of the input shape.
  • rank (Int): Input and output domain rank.
  • stencil_rank (Int): Rank of stencil subdomain slice.
  • stencil_axis (IndexList): Stencil subdomain axes.
  • simd_width (Int): The SIMD vector width to use.
  • dtype (DType): The input and output data dtype.
  • MapFnType (ImplicitlyCopyable & def[.element_type1: DType](IndexList[stencil_rank, element_type=element_type]) register_passable -> Tuple[IndexList[stencil_rank], IndexList[stencil_rank]]`): A closure maps a point in the output domain to input co-domain bounds.
  • MapStridesType (ImplicitlyCopyable & def(dim: Int) register_passable -> Int): A closure returns the stride for each dimension.
  • LoadFnType (ImplicitlyCopyable & def[simd_width: Int, dtype: DType, .element_type4: DType](IndexList[rank, element_type=element_type]) register_passable -> SIMD[dtype, simd_width]`): A closure loads a SIMD vector from input.
  • ComputeInitFnType (ImplicitlyCopyable & def[simd_width: Int]() register_passable -> SIMD[dtype, simd_width]): A closure initializes the stencil accumulator.
  • ComputeFnType (ImplicitlyCopyable & def[simd_width: Int, .element_type7: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width], SIMD[dtype, simd_width]) register_passable -> SIMD[dtype, simd_width]`): A closure processes the value computed for each stencil point.
  • ComputeFinalizeFnType (ImplicitlyCopyable & def[simd_width: Int, .element_type9: DType](IndexList[rank, element_type=element_type], SIMD[dtype, simd_width]) register_passable -> None`): A closure finalizes the output value from the stencil result.

Functions