IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /docs/manual/basics.md). For the complete Mojo documentation index, see llms.txt.
Skip to main content
Version: Nightly
For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

SIMD

struct SIMD[dtype: DType, size: Int]

Represents a vector type that leverages hardware acceleration to process multiple data elements with a single operation.

SIMD (Single Instruction, Multiple Data) is a fundamental parallel computing paradigm where a single CPU instruction operates on multiple data elements at once. Modern CPUs can perform 4, 8, 16, or even 32 operations in parallel using SIMD, delivering substantial performance improvements over scalar operations. Instead of processing one value at a time, SIMD processes entire vectors of values with each instruction.

For example, when adding two vectors of four values, a scalar operation adds each value in the vector one by one, while a SIMD operation adds all four values at once using vector registers:

Scalar operation: SIMD operation:
┌─────────────────────────┐ ┌───────────────────────────┐
│ 4 instructions │ │ 1 instruction │
│ 4 clock cycles │ │ 1 clock cycle │
│ │ │ │
│ ADD a[0], b[0] → c[0] │ │ Vector register A │
│ ADD a[1], b[1] → c[1] │ │ ┌─────┬─────┬─────┬─────┐ │
│ ADD a[2], b[2] → c[2] │ │ │a[0] │a[1] │a[2] │a[3] │ │
│ ADD a[3], b[3] → c[3] │ │ └─────┴─────┴─────┴─────┘ │
└─────────────────────────┘ │ + │
│ Vector register B │
│ ┌─────┬─────┬─────┬─────┐ │
│ │b[0] │b[1] │b[2] │b[3] │ │
│ └─────┴─────┴─────┴─────┘ │
│ ↓ │
│ SIMD_ADD │
│ ↓ │
│ Vector register C │
│ ┌─────┬─────┬─────┬─────┐ │
│ │c[0] │c[1] │c[2] │c[3] │ │
│ └─────┴─────┴─────┴─────┘ │
└───────────────────────────┘

The SIMD type maps directly to hardware vector registers and instructions. Mojo automatically generates optimal SIMD code that leverages CPU-specific instruction sets (such as AVX and NEON) without requiring manual intrinsics or assembly programming.

This type is the foundation of high-performance CPU computing in Mojo, enabling you to write code that automatically leverages modern CPU vector capabilities while maintaining code clarity and portability.

Caution: If you declare a SIMD vector size larger than the vector registers of the target hardware, the compiler will break up the SIMD into multiple vector registers for compatibility. However, you should avoid using a vector that's more than 2x the hardware's vector register size because the resulting code will perform poorly.

Key properties:

  • Hardware-mapped: Directly maps to CPU vector registers
  • Type-safe: Data types and vector sizes are checked at compile time
  • Zero-cost: No runtime overhead compared to hand-optimized intrinsics
  • Portable: Same code works across different CPU architectures (x86, ARM, etc.)
  • Composable: Seamlessly integrates with Mojo's parallelization features

Key APIs:

  • Construction:

    • Broadcast single value to all elements: SIMD[dtype, size](value)
    • Initialize with specific values: SIMD[dtype, size](v1, v2, ...)
    • Zero-initialized vector: SIMD[dtype, size]()
  • Element operations:

    • Arithmetic: +, -, *, /, %, //
    • Comparison: ==, !=, <, <=, >, >=
    • Math functions: sqrt(), sin(), cos(), fma(), etc.
    • Bit operations: &, |, ^, ~, <<, >>
  • Vector operations:

    • Horizontal reductions: reduce_add(), reduce_mul(), reduce_min(), reduce_max()
    • Element-wise conditional selection: select(condition, true_case, false_case)
    • Vector manipulation: shuffle(), slice(), join(), split()
    • Type conversion: cast[target_dtype]()

Examples:

Vectorized math operations:

# Process 8 floating-point numbers simultaneously
var a = SIMD[DType.float32, 8](1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0)
var b = SIMD[DType.float32, 8](2.0) # Broadcast 2.0 to all elements
var result = a * b + 1.0
print(result) # => [3.0, 5.0, 7.0, 9.0, 11.0, 13.0, 15.0, 17.0]

Conditional operations with masking:

# Double the positive values and negate the negative values
var values = SIMD[DType.int32, 4](1, -2, 3, -4)
var is_positive = values.gt(0) # greater-than: gets SIMD of booleans
var result = is_positive.select(values * 2, values * -1)
print(result) # => [2, 2, 6, 4]

Horizontal reductions:

# Sum all elements in a vector
var data = SIMD[DType.float64, 4](10.5, 20.3, 30.1, 40.7)
var total = data.reduce_add()
var maximum = data.reduce_max()
print(total, maximum) # => 101.6 40.7

Constraints:

The size of the SIMD vector must be positive and a power of 2.

Parameters

  • dtype (DType): The data type of SIMD vector elements.
  • size (Int): The size of the SIMD vector (number of elements).

Implemented traits

Absable, AnyType, Boolable, CeilDivable, Ceilable, Comparable, CoordLike, Copyable, Defaultable, DevicePassable, DivModable, Equatable, Floorable, Hashable, ImplicitlyCopyable, ImplicitlyDestructible, Indexer, Intable, Movable, Powable, RegisterPassable, Roundable, Sized, TrivialRegisterPassable, Truncable, Writable, _FromInt

comptime members

device_type

comptime device_type = SIMD[dtype, size]

SIMD types are remapped to the same type when passed to accelerator devices.

DTYPE

comptime DTYPE = dtype

The data type for the runtime integer value.

is_static_value

comptime is_static_value = False

True if the value is known at compile time.

is_tuple

comptime is_tuple = False

True if this is a tuple type (Coord), False for scalar values.

is_value

comptime is_value = True

True if this is a scalar value, False for tuple types.

MAX

comptime MAX = SIMD(max_or_inf[dtype]())

Gets the maximum value for the SIMD value, potentially +inf.

MAX_FINITE

comptime MAX_FINITE = SIMD(max_finite[dtype]())

Returns the maximum finite value of SIMD value.

MIN

comptime MIN = SIMD(min_or_neg_inf[dtype]())

Gets the minimum value for the SIMD value, potentially -inf.

MIN_FINITE

comptime MIN_FINITE = SIMD(min_finite[dtype]())

Returns the minimum (lowest) finite value of SIMD value.

ParamListType

comptime ParamListType = *?

The element types (Self for scalar types).

static_value

comptime static_value = -1

Always -1 for runtime values (not statically known).

Methods

__init__

def __init__() -> Self

Default initializer of the SIMD vector.

By default the SIMD vectors are initialized to all zeros.

def __init__[other_dtype: DType, //](value: SIMD[other_dtype, size], /) -> Self

Initialize from another SIMD of the same size. If the value passed is a scalar, you can initialize a SIMD vector with more elements.

Example:

print(UInt64(UInt8(42))) # 42
print(SIMD[DType.uint64, 4](UInt8(42))) # [42, 42, 42, 42]

Casting behavior:

# Basic casting preserves value within range
Int8(UInt8(127)) == Int8(127)

# Numbers above signed max wrap to negative using two's complement
Int8(UInt8(128)) == Int8(-128)
Int8(UInt8(129)) == Int8(-127)
Int8(UInt8(256)) == Int8(0)

# Negative signed cast to unsigned using two's complement
UInt8(Int8(-128)) == UInt8(128)
UInt8(Int8(-127)) == UInt8(129)
UInt8(Int8(-1)) == UInt8(255)

# Truncate precision after downcast and upcast
Float64(Float32(Float64(123456789.123456789))) == Float64(123456792.0)

# Rightmost bits of significand become 0's on upcast
Float64(Float32(0.3)) == Float64(0.30000001192092896)

# Numbers equal after truncation of float literal and cast truncation
Float32(Float64(123456789.123456789)) == Float32(123456789.123456789)

# Float to int/uint floors
Int64(Float64(42.2)) == Int64(42)

Parameters:

  • other_dtype (DType): The type of the value that is being cast from.

Args:

@implicit def __init__(value: Int, /) -> Self

Initializes the SIMD vector with a signed integer.

The signed integer value is splatted across all the elements of the SIMD vector.

Args:

  • value (Int): The input value.

def __init__[T: Floatable, //](value: T, /) -> Float64

Initialize a Float64 from a type conforming to Floatable.

Parameters:

Args:

  • value (T): The object to get the float point representation of.

Returns:

Float64

def __init__[T: FloatableRaising, //](out self: Float64, value: T, /)

Initialize a Float64 from a type conforming to FloatableRaising.

Parameters:

Args:

  • value (T): The object to get the float point representation of.

Returns:

Float64 Raises:

If the type does not have a float point representation.

@implicit def __init__(value: IntLiteral, /) -> Self

Initializes the SIMD vector with an integer.

The integer value is splatted across all the elements of the SIMD vector.

Args:

@implicit def __init__(value: Bool, /) -> SIMD[DType.bool, size]

Initializes a Scalar with a bool value.

Since this constructor does not splat, it can be implicit.

Args:

  • value (Bool): The bool value to initialize the Scalar with.

Returns:

SIMD[DType.bool, size]

def __init__(*, fill: Bool) -> SIMD[DType.bool, size]

Initializes the SIMD vector with a bool value.

The bool value is splatted across all elements of the SIMD vector.

Args:

  • fill (Bool): The bool value to fill each element of the SIMD vector with.

Returns:

SIMD[DType.bool, size]

@implicit def __init__(value: Scalar[dtype], /) -> Self

Constructs a SIMD vector by splatting a scalar value.

The input value is splatted across all elements of the SIMD vector.

Args:

  • value (Scalar[dtype]): The value to splat to the elements of the vector.

def __init__(*elems: Scalar[dtype], *, __list_literal__: NoneType = None) -> Self

Constructs a SIMD vector via a variadic list of elements.

The input values are assigned to the corresponding elements of the SIMD vector.

Constraints:

The number of input values is equal to size of the SIMD vector.

Args:

  • *elems (Scalar[dtype]): The variadic list of elements from which the SIMD vector is constructed.
  • list_literal (NoneType): Tell Mojo to use this method for list literals.

@implicit def __init__(value: FloatLiteral, /) -> Self

Initializes the SIMD vector with a float.

The value is splatted across all the elements of the SIMD vector.

Args:

def __init__[int_dtype: DType, //](*, from_bits: SIMD[int_dtype, size]) -> Self

Initializes the SIMD vector from the bits of an integral SIMD vector.

Parameters:

  • int_dtype (DType): The integral type of the input SIMD vector.

Args:

def __init__(out self: Scalar[dtype], *, py: PythonObject)

Initialize a SIMD value from a PythonObject.

Args:

Returns:

Scalar[dtype] Raises:

If the conversion to double fails.

__bool__

def __bool__(self) -> Bool

Converts the SIMD scalar into a boolean value.

Returns:

Bool: True if the SIMD scalar is non-zero and False otherwise.

__getitem__

def __getitem__(self, idx: Int) -> Scalar[dtype]

Gets an element from the vector.

Args:

  • idx (Int): The element index.

Returns:

Scalar[dtype]: The value at position idx.

__setitem__

def __setitem__(mut self, idx: Int, val: Scalar[dtype])

Sets an element in the vector.

Args:

__neg__

def __neg__(self) -> Self

Defines the unary - operation.

Returns:

Self: The negation of this SIMD vector.

__pos__

def __pos__(self) -> Self

Defines the unary + operation.

Returns:

Self: This SIMD vector.

__invert__

def __invert__(self) -> Self

Returns ~self.

Constraints:

The element type of the SIMD vector must be boolean or integral.

Returns:

Self: The ~self value.

__lt__

def __lt__(self, rhs: Self) -> Bool

Compares two Scalars using less-than comparison.

Args:

  • rhs (Self): The Scalar to compare with.

Returns:

Bool: True if self is less than rhs, False otherwise.

__le__

def __le__(self, rhs: Self) -> Bool

Compares two Scalars using less-than-or-equal comparison.

Args:

  • rhs (Self): The Scalar to compare with.

Returns:

Bool: True if self is less than or equal to rhs, False otherwise.

__eq__

def __eq__(self, rhs: Self) -> Bool

Compares two SIMD vectors for equality.

Args:

  • rhs (Self): The SIMD vector to compare with.

Returns:

Bool: True if all elements of the SIMD vectors are equal, False otherwise.

__ne__

def __ne__(self, rhs: Self) -> Bool

Compares two SIMD vectors for inequality.

Args:

  • rhs (Self): The SIMD vector to compare with.

Returns:

Bool: True if any elements of the SIMD vectors are not equal, False otherwise.

__gt__

def __gt__(self, rhs: Self) -> Bool

Compares two Scalars using greater-than comparison.

Args:

  • rhs (Self): The Scalar to compare with.

Returns:

Bool: True if self is greater than rhs, False otherwise.

__ge__

def __ge__(self, rhs: Self) -> Bool

Compares two Scalars using greater-than-or-equal comparison.

Args:

  • rhs (Self): The Scalar to compare with.

Returns:

Bool: True if self is greater than or equal to rhs, False otherwise.

__contains__

def __contains__(self, value: Scalar[dtype]) -> Bool

Whether the vector contains the value.

Args:

Returns:

Bool: Whether the vector contains the value.

__add__

def __add__(self, rhs: Self) -> Self

Computes self + rhs.

Args:

  • rhs (Self): The rhs value.

Returns:

Self: A new vector whose element at position i is computed as self[i] + rhs[i].

__sub__

def __sub__(self, rhs: Self) -> Self

Computes self - rhs.

Args:

  • rhs (Self): The rhs value.

Returns:

Self: A new vector whose element at position i is computed as self[i] - rhs[i].

__mul__

def __mul__(self, rhs: Self) -> Self

Computes self * rhs.

Args:

  • rhs (Self): The rhs value.

Returns:

Self: A new vector whose element at position i is computed as self[i] * rhs[i].

__truediv__

def __truediv__(self, rhs: Self) -> Self

Computes self / rhs.

Args:

  • rhs (Self): The rhs value.

Returns:

Self: A new vector whose element at position i is computed as self[i] / rhs[i].

__floordiv__

def __floordiv__(self, rhs: Self) -> Self

Returns the division of self and rhs rounded down to the nearest integer.

Constraints:

The element type of the SIMD vector must be numeric.

Args:

  • rhs (Self): The value to divide with.

Returns:

Self: floor(self / rhs) value.

__mod__

def __mod__(self, rhs: Self) -> Self

Returns the remainder of self divided by rhs.

Args:

  • rhs (Self): The value to divide with.

Returns:

Self: The remainder of dividing self by rhs.

__pow__

def __pow__(self, exp: Int) -> Self

Computes the vector raised to the power of the input integer value.

Args:

  • exp (Int): The exponent value.

Returns:

Self: A SIMD vector where each element is raised to the power of the specified exponent value.

def __pow__(self, exp: Self) -> Self

Computes the vector raised elementwise to the right hand side power.

Args:

  • exp (Self): The exponent value.

Returns:

Self: A SIMD vector where each element is raised to the power of the specified exponent value.

__lshift__

def __lshift__(self, rhs: Self) -> Self

Returns self << rhs.

Constraints:

The element type of the SIMD vector must be integral.

Args:

  • rhs (Self): The RHS value.

Returns:

Self: self << rhs.

__rshift__

def __rshift__(self, rhs: Self) -> Self

Returns self >> rhs.

Constraints:

The element type of the SIMD vector must be integral.

Args:

  • rhs (Self): The RHS value.

Returns:

Self: self >> rhs.

__and__

def __and__(self, rhs: Self) -> Self

Returns self & rhs.

Constraints:

The element type of the SIMD vector must be bool or integral.

Args:

  • rhs (Self): The RHS value.

Returns:

Self: self & rhs.

__or__

def __or__(self, rhs: Self) -> Self

Returns self | rhs.

Constraints:

The element type of the SIMD vector must be bool or integral.

Args:

  • rhs (Self): The RHS value.

Returns:

Self: self | rhs.

__xor__

def __xor__(self, rhs: Self) -> Self

Returns self ^ rhs.

Constraints:

The element type of the SIMD vector must be bool or integral.

Args:

  • rhs (Self): The RHS value.

Returns:

Self: self ^ rhs.

__radd__

def __radd__(self, value: Self) -> Self

Returns value + self.

Args:

  • value (Self): The other value.

Returns:

Self: value + self.

__rsub__

def __rsub__(self, value: Self) -> Self

Returns value - self.

Args:

  • value (Self): The other value.

Returns:

Self: value - self.

__rmul__

def __rmul__(self, value: Self) -> Self

Returns value * self.

Args:

  • value (Self): The other value.

Returns:

Self: value * self.

__rtruediv__

def __rtruediv__(self, value: Self) -> Self

Returns value / self.

Args:

  • value (Self): The other value.

Returns:

Self: value / self.

__rfloordiv__

def __rfloordiv__(self, rhs: Self) -> Self

Returns the division of rhs and self rounded down to the nearest integer.

Constraints:

The element type of the SIMD vector must be numeric.

Args:

  • rhs (Self): The value to divide by self.

Returns:

Self: floor(rhs / self) value.

__rmod__

def __rmod__(self, value: Self) -> Self

Returns value mod self.

Args:

  • value (Self): The other value.

Returns:

Self: value mod self.

__rpow__

def __rpow__(self, base: Self) -> Self

Returns base ** self.

Args:

  • base (Self): The base value.

Returns:

Self: base ** self.

__rlshift__

def __rlshift__(self, value: Self) -> Self

Returns value << self.

Constraints:

The element type of the SIMD vector must be integral.

Args:

  • value (Self): The other value.

Returns:

Self: value << self.

__rrshift__

def __rrshift__(self, value: Self) -> Self

Returns value >> self.

Constraints:

The element type of the SIMD vector must be integral.

Args:

  • value (Self): The other value.

Returns:

Self: value >> self.

__rand__

def __rand__(self, value: Self) -> Self

Returns value & self.

Constraints:

The element type of the SIMD vector must be bool or integral.

Args:

  • value (Self): The other value.

Returns:

Self: value & self.

__ror__

def __ror__(self, value: Self) -> Self

Returns value | self.

Constraints:

The element type of the SIMD vector must be bool or integral.

Args:

  • value (Self): The other value.

Returns:

Self: value | self.

__rxor__

def __rxor__(self, value: Self) -> Self

Returns value ^ self.

Constraints:

The element type of the SIMD vector must be bool or integral.

Args:

  • value (Self): The other value.

Returns:

Self: value ^ self.

__iadd__

def __iadd__(mut self, rhs: Self)

Performs in-place addition.

The vector is mutated where each element at position i is computed as self[i] + rhs[i].

Args:

  • rhs (Self): The rhs of the addition operation.

__isub__

def __isub__(mut self, rhs: Self)

Performs in-place subtraction.

The vector is mutated where each element at position i is computed as self[i] - rhs[i].

Args:

  • rhs (Self): The rhs of the operation.

__imul__

def __imul__(mut self, rhs: Self)

Performs in-place multiplication.

The vector is mutated where each element at position i is computed as self[i] * rhs[i].

Args:

  • rhs (Self): The rhs of the operation.

__itruediv__

def __itruediv__(mut self, rhs: Self)

In-place true divide operator.

The vector is mutated where each element at position i is computed as self[i] / rhs[i].

Args:

  • rhs (Self): The rhs of the operation.

__ifloordiv__

def __ifloordiv__(mut self, rhs: Self)

In-place flood div operator.

The vector is mutated where each element at position i is computed as self[i] // rhs[i].

Args:

  • rhs (Self): The rhs of the operation.

__imod__

def __imod__(mut self, rhs: Self)

In-place mod operator.

The vector is mutated where each element at position i is computed as self[i] % rhs[i].

Args:

  • rhs (Self): The rhs of the operation.

__ipow__

def __ipow__(mut self, rhs: Int)

In-place pow operator.

The vector is mutated where each element at position i is computed as pow(self[i], rhs).

Args:

  • rhs (Int): The rhs of the operation.

__ilshift__

def __ilshift__(mut self, rhs: Self)

Computes self << rhs and save the result in self.

Constraints:

The element type of the SIMD vector must be integral.

Args:

  • rhs (Self): The RHS value.

__irshift__

def __irshift__(mut self, rhs: Self)

Computes self >> rhs and save the result in self.

Constraints:

The element type of the SIMD vector must be integral.

Args:

  • rhs (Self): The RHS value.

__iand__

def __iand__(mut self, rhs: Self)

Computes self & rhs and save the result in self.

Constraints:

The element type of the SIMD vector must be bool or integral.

Args:

  • rhs (Self): The RHS value.

__ixor__

def __ixor__(mut self, rhs: Self)

Computes self ^ rhs and save the result in self.

Constraints:

The element type of the SIMD vector must be bool or integral.

Args:

  • rhs (Self): The RHS value.

__ior__

def __ior__(mut self, rhs: Self)

Computes self | rhs and save the result in self.

Constraints:

The element type of the SIMD vector must be bool or integral.

Args:

  • rhs (Self): The RHS value.

get_type_name

static def get_type_name() -> String

Gets this type's name, for use in error messages when handing arguments to kernels. TODO: This will go away soon, when we get better error messages for kernel calls.

Returns:

String: This type's name.

__divmod__

def __divmod__(self, denominator: Self) -> Tuple[Self, Self]

Computes both the quotient and remainder using floor division.

Args:

  • denominator (Self): The value to divide on.

Returns:

Tuple[Self, Self]: The quotient and remainder as a Tuple(self // denominator, self % denominator).

eq

def eq(self, rhs: Self) -> SIMD[DType.bool, size]

Compares two SIMD vectors using elementwise equality.

Args:

  • rhs (Self): The SIMD vector to compare with.

Returns:

SIMD[DType.bool, size]: A new bool SIMD vector of the same size whose element at position i is the value of self[i] == rhs[i].

ne

def ne(self, rhs: Self) -> SIMD[DType.bool, size]

Compares two SIMD vectors using elementwise inequality.

Args:

  • rhs (Self): The SIMD vector to compare with.

Returns:

SIMD[DType.bool, size]: A new bool SIMD vector of the same size whose element at position i is the value of self[i] != rhs[i].

gt

def gt(self, rhs: Self) -> SIMD[DType.bool, size]

Compares two SIMD vectors using elementwise greater-than comparison.

Args:

  • rhs (Self): The SIMD vector to compare with.

Returns:

SIMD[DType.bool, size]: A new bool SIMD vector of the same size whose element at position i is the value of self[i] > rhs[i].

ge

def ge(self, rhs: Self) -> SIMD[DType.bool, size]

Compares two SIMD vectors using elementwise greater-than-or-equal comparison.

Args:

  • rhs (Self): The SIMD vector to compare with.

Returns:

SIMD[DType.bool, size]: A new bool SIMD vector of the same size whose element at position i is the value of self[i] >= rhs[i].

lt

def lt(self, rhs: Self) -> SIMD[DType.bool, size]

Compares two SIMD vectors using elementwise less-than comparison.

Args:

  • rhs (Self): The SIMD vector to compare with.

Returns:

SIMD[DType.bool, size]: A new bool SIMD vector of the same size whose element at position i is the value of self[i] < rhs[i].

le

def le(self, rhs: Self) -> SIMD[DType.bool, size]

Compares two SIMD vectors using elementwise less-than-or-equal comparison.

Args:

  • rhs (Self): The SIMD vector to compare with.

Returns:

SIMD[DType.bool, size]: A new bool SIMD vector of the same size whose element at position i is the value of self[i] <= rhs[i].

__len__

def __len__(self) -> Int

Gets the length of the SIMD vector.

Returns:

Int: The length of the SIMD vector.

static def __len__() -> Int

Get the length (always 1 for scalar types).

Returns:

Int: Always returns 1.

__int__

def __int__(self) -> Int

Casts to the value to an Int. If there is a fractional component, then the fractional part is truncated.

Constraints:

The size of the SIMD vector must be 1.

Returns:

Int: The value as an integer.

__float__

def __float__(self) -> Float64

Casts the value to a float.

Constraints:

The size of the SIMD vector must be 1.

Returns:

Float64: The value as a float.

__floor__

def __floor__(self) -> Self

Performs elementwise floor on the elements of a SIMD vector.

Returns:

Self: The elementwise floor of this SIMD vector.

__ceil__

def __ceil__(self) -> Self

Performs elementwise ceiling on the elements of a SIMD vector.

Returns:

Self: The elementwise ceiling of this SIMD vector.

__trunc__

def __trunc__(self) -> Self

Performs elementwise truncation on the elements of a SIMD vector.

Returns:

Self: The elementwise truncated values of this SIMD vector.

__abs__

def __abs__(self) -> Self

Defines the absolute value operation.

For signed integral element types, the absolute value of the minimum representable value is the minimum value itself.

Returns:

Self: The absolute value of this SIMD vector.

__round__

def __round__(self) -> Self

Performs elementwise rounding on the elements of a SIMD vector.

This rounding goes to the nearest integer with ties towards the nearest even value ("banker's rounding"). This is the default rounding mode for binary floating point in the IEEE 754 Standard for Floating Point Arithmetic.

Returns:

Self: The elementwise rounded value of this SIMD vector.

def __round__(self, ndigits: Int) -> Self

Performs elementwise rounding on the elements of a SIMD vector.

This rounding goes to the nearest integer with ties towards the nearest even value ("banker's rounding"). This is the default rounding mode for binary floating point in the IEEE 754 Standard for Floating Point Arithmetic.

Args:

  • ndigits (Int): The number of digits to round to.

Returns:

Self: The elementwise rounded value of this SIMD vector.

__hash__

def __hash__[H: Hasher](self, mut hasher: H)

Updates hasher with this SIMD value.

Parameters:

  • H (Hasher): The hasher type.

Args:

  • hasher (H): The hasher instance.

__ceildiv__

def __ceildiv__(self, denominator: Self) -> Self

Return the rounded-up result of dividing self by denominator.

Args:

  • denominator (Self): The denominator.

Returns:

Self: The ceiling of dividing numerator by denominator.

cast

def cast[target: DType](self) -> SIMD[target, size]

Casts the elements of the SIMD vector to the target element type.

Casting behavior:

# Basic casting preserves value within range
Int8(UInt8(127)) == Int8(127)

# Numbers above signed max wrap to negative using two's complement
Int8(UInt8(128)) == Int8(-128)
Int8(UInt8(129)) == Int8(-127)
Int8(UInt8(256)) == Int8(0)

# Negative signed cast to unsigned using two's complement
UInt8(Int8(-128)) == UInt8(128)
UInt8(Int8(-127)) == UInt8(129)
UInt8(Int8(-1)) == UInt8(255)

# Truncate precision after downcast and upcast
Float64(Float32(Float64(123456789.123456789))) == Float64(123456792.0)

# Rightmost bits of significand become 0's on upcast
Float64(Float32(0.3)) == Float64(0.30000001192092896)

# Numbers equal after truncation of float literal and cast truncation
Float32(Float64(123456789.123456789)) == Float32(123456789.123456789)

# Float to int/uint floors
Int64(Float64(42.2)) == Int64(42)

Parameters:

  • target (DType): The target DType.

Returns:

SIMD[target, size]: A new SIMD vector whose elements have been cast to the target element type.

is_power_of_two

def is_power_of_two(self) -> SIMD[DType.bool, size]

Checks if the input value is a power of 2 for each element of a SIMD vector.

Constraints:

The element type of the input vector must be integral.

Returns:

SIMD[DType.bool, size]: A SIMD value where the element at position i is True if the integer at position i of the input value is a power of 2, False otherwise.

write_to

def write_to(self, mut writer: T)

Formats this SIMD value to the provided Writer.

Args:

  • writer (T): The object to write to.

write_repr_to

def write_repr_to(self, mut writer: T)

Write the string representation of the SIMD value.

Args:

  • writer (T): The value to write to.

write_padded

def write_padded[W: Writer](self, mut writer: W, width: Int) where dtype.is_integral()

Write the integral SIMD with each element right-aligned to a set padding. No additional space between elements is inserted.

Parameters:

  • W (Writer): A type conforming to the Writable trait.

Args:

  • writer (W): The object to write to.
  • width (Int): The amount to pad to the left.

to_bits

def to_bits[_dtype: DType = _uint_type_of_width[bit_width_of[dtype]()]()](self) -> SIMD[_dtype, size]

Bitcasts the SIMD vector to an integer SIMD vector.

Parameters:

  • _dtype (DType): The integer type to cast to.

Returns:

SIMD[_dtype, size]: An integer representation of the floating-point value.

from_bytes

static def from_bytes[*, big_endian: Bool = is_big_endian()](bytes: InlineArray[UInt8, size_of[Self]()]) -> Self

Converts a byte array to a vector.

Parameters:

  • big_endian (Bool): Whether the byte array is big-endian.

Args:

Returns:

Self: The integer value.

as_bytes

def as_bytes[*, big_endian: Bool = is_big_endian()](self) -> InlineArray[UInt8, size_of[Self]()]

Convert the vector to a byte array.

Parameters:

  • big_endian (Bool): Whether the byte array should be big-endian.

Returns:

InlineArray[UInt8, size_of[Self]()]: The byte array.

clamp

def clamp(self, lower_bound: Self, upper_bound: Self) -> Self

Clamps the values in a SIMD vector to be in a certain range.

Clamp cuts values in the input SIMD vector off at the upper bound and lower bound values. For example, SIMD vector [0, 1, 2, 3] clamped to a lower bound of 1 and an upper bound of 2 would return [1, 1, 2, 2].

Args:

  • lower_bound (Self): Minimum of the range to clamp to.
  • upper_bound (Self): Maximum of the range to clamp to.

Returns:

Self: A new SIMD vector containing x clamped to be within lower_bound and upper_bound.

fma

def fma[flag: FastMathFlag = FastMathFlag.CONTRACT](self, multiplier: Self, accumulator: Self) -> Self

Performs a fused multiply-add operation, i.e. self*multiplier + accumulator.

Parameters:

  • flag (FastMathFlag): Fast-math optimization flags to apply (default: CONTRACT).

Args:

  • multiplier (Self): The value to multiply.
  • accumulator (Self): The value to accumulate.

Returns:

Self: A new vector whose element at position i is computed as self[i]*multiplier[i] + accumulator[i].

shuffle

def shuffle[*mask: Int](self) -> Self

Shuffles (also called blend) the values of the current vector with the other value using the specified mask (permutation). The mask values must be within 2 * len(self).

Parameters:

  • *mask (Int): The permutation to use in the shuffle.

Returns:

Self: A new vector with the same length as the mask where the value at position i is (self)[permutation[i]].

def shuffle[*mask: Int](self, other: Self) -> Self

Shuffles (also called blend) the values of the current vector with the other value using the specified mask (permutation). The mask values must be within 2 * len(self).

Parameters:

  • *mask (Int): The permutation to use in the shuffle.

Args:

  • other (Self): The other vector to shuffle with.

Returns:

Self: A new vector with the same length as the mask where the value at position i is (self + other)[permutation[i]].

def shuffle[mask: IndexList[size, element_type=mask.element_type]](self) -> Self

Shuffles (also called blend) the values of the current vector with the other value using the specified mask (permutation). The mask values must be within 2 * len(self).

Parameters:

Returns:

Self: A new vector with the same length as the mask where the value at position i is (self)[permutation[i]].

def shuffle[mask: IndexList[size, element_type=mask.element_type]](self, other: Self) -> Self

Shuffles (also called blend) the values of the current vector with the other value using the specified mask (permutation). The mask values must be within 2 * len(self).

Parameters:

Args:

  • other (Self): The other vector to shuffle with.

Returns:

Self: A new vector with the same length as the mask where the value at position i is (self + other)[permutation[i]].

slice

def slice[output_width: Int, /, *, offset: Int = 0](self) -> SIMD[dtype, output_width]

Returns a slice of the vector of the specified width with the given offset.

Constraints:

output_width + offset must not exceed the size of this SIMD vector.

Parameters:

  • output_width (Int): The output SIMD vector size.
  • offset (Int): The given offset for the slice.

Returns:

SIMD[dtype, output_width]: A new vector whose elements map to self[offset:offset+output_width].

insert

def insert[*, offset: Int = 0](self, value: SIMD[dtype]) -> Self

Returns a new vector where the elements between offset and offset + input_width have been replaced with the elements in value.

Parameters:

  • offset (Int): The offset to insert at. This must be a multiple of value's size.

Args:

Returns:

Self: A new vector whose elements at self[offset:offset+input_width] contain the values of value.

join

def join(self, other: Self) -> SIMD[dtype, (2 * size)]

Concatenates the two vectors together.

Args:

  • other (Self): The other SIMD vector.

Returns:

SIMD[dtype, (2 * size)]: A new vector self_0, self_1, ..., self_n, other_0, ..., other_n.

interleave

def interleave(self, other: Self) -> SIMD[dtype, (2 * size)]

Constructs a vector by interleaving two input vectors.

Args:

  • other (Self): The other SIMD vector.

Returns:

SIMD[dtype, (2 * size)]: A new vector self_0, other_0, ..., self_n, other_n.

split

def split(self) -> Tuple[SIMD[dtype, (size // 2)], SIMD[dtype, (size // 2)]]

Splits the SIMD vector into 2 subvectors.

Returns:

Tuple[SIMD[dtype, (size // 2)], SIMD[dtype, (size // 2)]]: A new vector self_0:N/2, self_N/2:N.

deinterleave

def deinterleave(self) -> Tuple[SIMD[dtype, (size / 2)], SIMD[dtype, (size / 2)]]

Constructs two vectors by deinterleaving the even and odd lanes of the vector.

Constraints:

The vector size must be greater than 1.

Returns:

Tuple[SIMD[dtype, (size / 2)], SIMD[dtype, (size / 2)]]: Two vectors the first of the form self_0, self_2, ..., self_{n-2} and the other being self_1, self_3, ..., self_{n-1}.

reduce

def reduce[func: def[width: Int](SIMD[dtype, width], SIMD[dtype, width]) -> SIMD[dtype, width], size_out: Int = 1](self) -> SIMD[dtype, size_out]

Reduces the vector using a provided reduce operator.

Constraints:

size_out must not exceed width of the vector.

Parameters:

  • func (def[width: Int](SIMD[dtype, width], SIMD[dtype, width]) -> SIMD[dtype, width]): The reduce function to apply to elements in this SIMD.
  • size_out (Int): The width of the reduction.

Returns:

SIMD[dtype, size_out]: A new scalar which is the reduction of all vector elements.

def reduce[func: def[width: Int](SIMD[dtype, width], SIMD[dtype, width]) capturing -> SIMD[dtype, width], size_out: Int = 1](self) -> SIMD[dtype, size_out]

Reduces the vector using a provided reduce operator.

Constraints:

size_out must not exceed width of the vector.

Parameters:

  • func (def[width: Int](SIMD[dtype, width], SIMD[dtype, width]) capturing -> SIMD[dtype, width]): The reduce function to apply to elements in this SIMD.
  • size_out (Int): The width of the reduction.

Returns:

SIMD[dtype, size_out]: A new scalar which is the reduction of all vector elements.

reduce_max

def reduce_max[size_out: Int = 1](self) -> SIMD[dtype, size_out]

Reduces the vector using the max operator.

Constraints:

size_out must not exceed width of the vector. The element type of the vector must be integer or FP.

Parameters:

  • size_out (Int): The width of the reduction.

Returns:

SIMD[dtype, size_out]: The maximum element of the vector.

reduce_min

def reduce_min[size_out: Int = 1](self) -> SIMD[dtype, size_out]

Reduces the vector using the min operator.

Constraints:

size_out must not exceed width of the vector. The element type of the vector must be integer or FP.

Parameters:

  • size_out (Int): The width of the reduction.

Returns:

SIMD[dtype, size_out]: The minimum element of the vector.

reduce_add

def reduce_add[size_out: Int = 1](self) -> SIMD[dtype, size_out]

Reduces the vector using the add operator.

Constraints:

size_out must not exceed width of the vector.

Parameters:

  • size_out (Int): The width of the reduction.

Returns:

SIMD[dtype, size_out]: The sum of all vector elements.

reduce_mul

def reduce_mul[size_out: Int = 1](self) -> SIMD[dtype, size_out]

Reduces the vector using the mul operator.

Constraints:

size_out must not exceed width of the vector. The element type of the vector must be integer or FP.

Parameters:

  • size_out (Int): The width of the reduction.

Returns:

SIMD[dtype, size_out]: The product of all vector elements.

reduce_and

def reduce_and[size_out: Int = 1](self) -> SIMD[dtype, size_out]

Reduces the vector using the bitwise & operator.

Constraints:

size_out must not exceed width of the vector. The element type of the vector must be integer or boolean.

Parameters:

  • size_out (Int): The width of the reduction.

Returns:

SIMD[dtype, size_out]: The reduced vector.

reduce_or

def reduce_or[size_out: Int = 1](self) -> SIMD[dtype, size_out]

Reduces the vector using the bitwise | operator.

Constraints:

size_out must not exceed width of the vector. The element type of the vector must be integer or boolean.

Parameters:

  • size_out (Int): The width of the reduction.

Returns:

SIMD[dtype, size_out]: The reduced vector.

reduce_bit_count

def reduce_bit_count(self) -> Int

Returns the total number of bits set in the SIMD vector.

Constraints:

Must be either an integral or a boolean type.

Returns:

Int: Count of set bits across all elements of the vector.

select

def select[_dtype: DType](self, true_case: SIMD[_dtype, size], false_case: SIMD[_dtype, size]) -> SIMD[_dtype, size]

Selects the values of the true_case or the false_case based on the current boolean values of the SIMD vector.

Constraints:

The element type of the vector must be boolean.

Parameters:

  • _dtype (DType): The element type of the input and output SIMD vectors.

Args:

Returns:

SIMD[_dtype, size]: A new vector of the form [true_case[i] if elem else false_case[i] for i, elem in enumerate(self)].

rotate_left

def rotate_left[shift: Int](self) -> Self

Shifts the elements of a SIMD vector to the left by shift elements (with wrap-around).

Constraints:

-size <= shift < size

Parameters:

  • shift (Int): The number of positions by which to rotate the elements of SIMD vector to the left (with wrap-around).

Returns:

Self: The SIMD vector rotated to the left by shift elements (with wrap-around).

rotate_right

def rotate_right[shift: Int](self) -> Self

Shifts the elements of a SIMD vector to the right by shift elements (with wrap-around).

Constraints:

-size < shift <= size

Parameters:

  • shift (Int): The number of positions by which to rotate the elements of SIMD vector to the right (with wrap-around).

Returns:

Self: The SIMD vector rotated to the right by shift elements (with wrap-around).

shift_left

def shift_left[shift: Int](self) -> Self

Shifts the elements of a SIMD vector to the left by shift elements (no wrap-around, fill with zero).

Constraints:

0 <= shift <= size

Parameters:

  • shift (Int): The number of positions by which to rotate the elements of SIMD vector to the left (no wrap-around, fill with zero).

Returns:

Self: The SIMD vector rotated to the left by shift elements (no wrap-around, fill with zero).

shift_right

def shift_right[shift: Int](self) -> Self

Shifts the elements of a SIMD vector to the right by shift elements (no wrap-around, fill with zero).

Constraints:

0 <= shift <= size

Parameters:

  • shift (Int): The number of positions by which to rotate the elements of SIMD vector to the right (no wrap-around, fill with zero).

Returns:

Self: The SIMD vector rotated to the right by shift elements (no wrap-around, fill with zero).

reversed

def reversed(self) -> Self

Reverses the SIMD vector by indexes.

Examples:

print(SIMD[DType.uint8, 4](1, 2, 3, 4).reversed()) # [4, 3, 2, 1]

Returns:

Self: The by index reversed vector.

product

def product(self) -> Scalar[dtype]

Calculate the product (returns the value for scalar types).

Returns:

Scalar[dtype]: The integer value.

sum

def sum(self) -> Scalar[dtype]

Calculate the sum (returns the value for scalar types).

Returns:

Scalar[dtype]: The integer value.

value

def value(self) -> Scalar[dtype]

Get the scalar value.

Returns:

Scalar[dtype]: The runtime integer value.

tuple

def tuple(var self) -> Coord[Self]

Get as a tuple (not valid for Scalar CoordLike).

Returns:

Coord[Self]: Never returns; aborts at compile time.