SIMD
struct SIMD[dtype: DType, size: Int]
Represents a vector type that leverages hardware acceleration to process multiple data elements with a single operation.
SIMD (Single Instruction, Multiple Data) is a fundamental parallel computing paradigm where a single CPU instruction operates on multiple data elements at once. Modern CPUs can perform 4, 8, 16, or even 32 operations in parallel using SIMD, delivering substantial performance improvements over scalar operations. Instead of processing one value at a time, SIMD processes entire vectors of values with each instruction.
For example, when adding two vectors of four values, a scalar operation adds each value in the vector one by one, while a SIMD operation adds all four values at once using vector registers:
Scalar operation: SIMD operation:
┌───────────────────────── ┐ ┌───────────────────────────┐
│ 4 instructions │ │ 1 instruction │
│ 4 clock cycles │ │ 1 clock cycle │
│ │ │ │
│ ADD a[0], b[0] → c[0] │ │ Vector register A │
│ ADD a[1], b[1] → c[1] │ │ ┌─────┬─────┬─────┬─────┐ │
│ ADD a[2], b[2] → c[2] │ │ │a[0] │a[1] │a[2] │a[3] │ │
│ ADD a[3], b[3] → c[3] │ │ └─────┴─────┴─────┴─────┘ │
└─────────────────────────┘ │ + │
│ Vector register B │
│ ┌─────┬─────┬─────┬─────┐ │
│ │b[0] │b[1] │b[2] │b[3] │ │
│ └─────┴─────┴─────┴─────┘ │
│ ↓ │
│ SIMD_ADD │
│ ↓ │
│ Vector register C │
│ ┌─────┬─────┬─────┬─────┐ │
│ │c[0] │c[1] │c[2] │c[3] │ │
│ └─────┴─────┴─────┴─────┘ │
└───────────────────────────┘
The SIMD type maps directly to hardware vector registers and
instructions. Mojo automatically generates optimal SIMD code that leverages
CPU-specific instruction sets (such as AVX and NEON) without requiring
manual intrinsics or assembly programming.
This type is the foundation of high-performance CPU computing in Mojo, enabling you to write code that automatically leverages modern CPU vector capabilities while maintaining code clarity and portability.
Caution: If you declare a SIMD vector size larger than the vector registers of the target hardware, the compiler will break up the SIMD into multiple vector registers for compatibility. However, you should avoid using a vector that's more than 2x the hardware's vector register size because the resulting code will perform poorly.
Key properties:
- Hardware-mapped: Directly maps to CPU vector registers
- Type-safe: Data types and vector sizes are checked at compile time
- Zero-cost: No runtime overhead compared to hand-optimized intrinsics
- Portable: Same code works across different CPU architectures (x86, ARM, etc.)
- Composable: Seamlessly integrates with Mojo's parallelization features
Key APIs:
-
Construction:
- Broadcast single value to all elements:
SIMD[dtype, size](value) - Initialize with specific values:
SIMD[dtype, size](v1, v2, ...) - Zero-initialized vector:
SIMD[dtype, size]()
- Broadcast single value to all elements:
-
Element operations:
- Arithmetic:
+,-,*,/,%,// - Comparison:
==,!=,<,<=,>,>= - Math functions:
sqrt(),sin(),cos(),fma(), etc. - Bit operations:
&,|,^,~,<<,>>
- Arithmetic:
-
Vector operations:
- Horizontal reductions:
reduce_add(),reduce_mul(),reduce_min(),reduce_max() - Element-wise conditional selection:
select(condition, true_case, false_case) - Vector manipulation:
shuffle(),slice(),join(),split() - Type conversion:
cast[target_dtype]()
- Horizontal reductions:
Examples:
Vectorized math operations:
# Process 8 floating-point numbers simultaneously
var a = SIMD[DType.float32, 8](1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0)
var b = SIMD[DType.float32, 8](2.0) # Broadcast 2.0 to all elements
var result = a * b + 1.0
print(result) # => [3.0, 5.0, 7.0, 9.0, 11.0, 13.0, 15.0, 17.0]
Conditional operations with masking:
# Double the positive values and negate the negative values
var values = SIMD[DType.int32, 4](1, -2, 3, -4)
var is_positive = values.gt(0) # greater-than: gets SIMD of booleans
var result = is_positive.select(values * 2, values * -1)
print(result) # => [2, 2, 6, 4]
Horizontal reductions:
# Sum all elements in a vector
var data = SIMD[DType.float64, 4](10.5, 20.3, 30.1, 40.7)
var total = data.reduce_add()
var maximum = data.reduce_max()
print(total, maximum) # => 101.6 40.7
Constraints:
The size of the SIMD vector must be positive and a power of 2.
Parameters
- dtype (
DType): The data type of SIMD vector elements. - size (
Int): The size of the SIMD vector (number of elements).
Implemented traits
Absable,
AnyType,
Boolable,
CeilDivable,
Ceilable,
Comparable,
ConvertibleToPython,
Copyable,
Defaultable,
DevicePassable,
DivModable,
Equatable,
Floorable,
Hashable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Indexer,
Intable,
Movable,
Powable,
RegisterPassable,
Roundable,
Sized,
TrivialRegisterPassable,
Truncable,
Writable,
_FromInt
comptime members
device_type
comptime device_type = SIMD[dtype, size]
SIMD types are remapped to the same type when passed to accelerator devices.
MAX
comptime MAX = SIMD(max_or_inf[dtype]())
Gets the maximum value for the SIMD value, potentially +inf.
MAX_FINITE
comptime MAX_FINITE = SIMD(max_finite[dtype]())
Returns the maximum finite value of SIMD value.
MIN
comptime MIN = SIMD(min_or_neg_inf[dtype]())
Gets the minimum value for the SIMD value, potentially -inf.
MIN_FINITE
comptime MIN_FINITE = SIMD(min_finite[dtype]())
Returns the minimum (lowest) finite value of SIMD value.
Methods
__init__
__init__() -> Self
Default initializer of the SIMD vector.
By default the SIMD vectors are initialized to all zeros.
__init__[other_dtype: DType, //](value: SIMD[other_dtype, size], /) -> Self
Initialize from another SIMD of the same size. If the value passed is a scalar, you can initialize a SIMD vector with more elements.
Example:
print(UInt64(UInt8(42))) # 42
print(SIMD[DType.uint64, 4](UInt8(42))) # [42, 42, 42, 42]
Casting behavior:
# Basic casting preserves value within range
Int8(UInt8(127)) == Int8(127)
# Numbers above signed max wrap to negative using two's complement
Int8(UInt8(128)) == Int8(-128)
Int8(UInt8(129)) == Int8(-127)
Int8(UInt8(256)) == Int8(0)
# Negative signed cast to unsigned using two's complement
UInt8(Int8(-128)) == UInt8(128)
UInt8(Int8(-127)) == UInt8(129)
UInt8(Int8(-1)) == UInt8(255)
# Truncate precision after downcast and upcast
Float64(Float32(Float64(123456789.123456789))) == Float64(123456792.0)
# Rightmost bits of significand become 0's on upcast
Float64(Float32(0.3)) == Float64(0.30000001192092896)
# Numbers equal after truncation of float literal and cast truncation
Float32(Float64(123456789.123456789)) == Float32(123456789.123456789)
# Float to int/uint floors
Int64(Float64(42.2)) == Int64(42)
Parameters:
- other_dtype (
DType): The type of the value that is being cast from.
Args:
- value (
SIMD): The value to cast from.
@implicit
__init__(value: Int, /) -> Self
Initializes the SIMD vector with a signed integer.
The signed integer value is splatted across all the elements of the SIMD vector.
Args:
- value (
Int): The input value.
__init__[T: Floatable, //](value: T, /) -> Float64
Initialize a Float64 from a type conforming to Floatable.
Parameters:
- T (
Floatable): The Floatable type.
Args:
- value (
T): The object to get the float point representation of.
Returns:
__init__[T: FloatableRaising, //](out self: Float64, value: T, /)
Initialize a Float64 from a type conforming to FloatableRaising.
Parameters:
- T (
FloatableRaising): The FloatableRaising type.
Args:
- value (
T): The object to get the float point representation of.
Returns:
Float64
Raises:
If the type does not have a float point representation.
@implicit
__init__(value: IntLiteral[value.value], /) -> Self
Initializes the SIMD vector with an integer.
The integer value is splatted across all the elements of the SIMD vector.
Args:
- value (
IntLiteral): The input value.
@implicit
__init__(value: Bool, /) -> SIMD[DType.bool, size]
Initializes a Scalar with a bool value.
Since this constructor does not splat, it can be implicit.
Args:
- value (
Bool): The bool value to initialize the Scalar with.
Returns:
__init__(*, fill: Bool) -> SIMD[DType.bool, size]
Initializes the SIMD vector with a bool value.
The bool value is splatted across all elements of the SIMD vector.
Args:
- fill (
Bool): The bool value to fill each element of the SIMD vector with.
Returns:
@implicit
__init__(value: Scalar[dtype], /) -> Self
Constructs a SIMD vector by splatting a scalar value.
The input value is splatted across all elements of the SIMD vector.
Args:
- value (
Scalar): The value to splat to the elements of the vector.
__init__(*elems: Scalar[dtype], *, __list_literal__: Tuple = Tuple()) -> Self
Constructs a SIMD vector via a variadic list of elements.
The input values are assigned to the corresponding elements of the SIMD vector.
Constraints:
The number of input values is equal to size of the SIMD vector.
Args:
@implicit
__init__(value: FloatLiteral[value.value], /) -> Self
Initializes the SIMD vector with a float.
The value is splatted across all the elements of the SIMD vector.
Args:
- value (
FloatLiteral): The input value.
__init__[int_dtype: DType, //](*, from_bits: SIMD[int_dtype, size]) -> Self
Initializes the SIMD vector from the bits of an integral SIMD vector.
Parameters:
- int_dtype (
DType): The integral type of the input SIMD vector.
Args:
- from_bits (
SIMD): The SIMD vector to copy the bits from.
__init__(out self: Scalar[dtype], *, py: PythonObject)
Initialize a SIMD value from a PythonObject.
Args:
- py (
PythonObject): The PythonObject to convert.
Returns:
Scalar
Raises:
If the conversion to double fails.
__bool__
__bool__(self) -> Bool
Converts the SIMD scalar into a boolean value.
Returns:
Bool: True if the SIMD scalar is non-zero and False otherwise.
__getitem__
__getitem__(self, idx: Int) -> Scalar[dtype]
Gets an element from the vector.
Args:
- idx (
Int): The element index.
Returns:
Scalar: The value at position idx.
__setitem__
__setitem__(mut self, idx: Int, val: Scalar[dtype])
Sets an element in the vector.
Args:
__neg__
__neg__(self) -> Self
Defines the unary - operation.
Returns:
Self: The negation of this SIMD vector.
__pos__
__pos__(self) -> Self
Defines the unary + operation.
Returns:
Self: This SIMD vector.
__invert__
__invert__(self) -> Self
Returns ~self.
Constraints:
The element type of the SIMD vector must be boolean or integral.
Returns:
Self: The ~self value.
__lt__
__lt__(self, rhs: Self) -> Bool
Compares two Scalars using less-than comparison.
Args:
- rhs (
Self): The Scalar to compare with.
Returns:
Bool: True if self is less than rhs, False otherwise.
__le__
__le__(self, rhs: Self) -> Bool
Compares two Scalars using less-than-or-equal comparison.
Args:
- rhs (
Self): The Scalar to compare with.
Returns:
Bool: True if self is less than or equal to rhs, False otherwise.
__eq__
__eq__(self, rhs: Self) -> Bool
Compares two SIMD vectors for equality.
Args:
- rhs (
Self): The SIMD vector to compare with.
Returns:
Bool: True if all elements of the SIMD vectors are equal, False otherwise.
__ne__
__ne__(self, rhs: Self) -> Bool
Compares two SIMD vectors for inequality.
Args:
- rhs (
Self): The SIMD vector to compare with.
Returns:
Bool: True if any elements of the SIMD vectors are not equal, False
otherwise.
__gt__
__gt__(self, rhs: Self) -> Bool
Compares two Scalars using greater-than comparison.
Args:
- rhs (
Self): The Scalar to compare with.
Returns:
Bool: True if self is greater than rhs, False otherwise.
__ge__
__ge__(self, rhs: Self) -> Bool
Compares two Scalars using greater-than-or-equal comparison.
Args:
- rhs (
Self): The Scalar to compare with.
Returns:
Bool: True if self is greater than or equal to rhs, False otherwise.
__contains__
__contains__(self, value: Scalar[dtype]) -> Bool
Whether the vector contains the value.
Args:
- value (
Scalar): The value.
Returns:
Bool: Whether the vector contains the value.
__add__
__add__(self, rhs: Self) -> Self
Computes self + rhs.
Args:
- rhs (
Self): The rhs value.
Returns:
Self: A new vector whose element at position i is computed as
self[i] + rhs[i].
__sub__
__sub__(self, rhs: Self) -> Self
Computes self - rhs.
Args:
- rhs (
Self): The rhs value.
Returns:
Self: A new vector whose element at position i is computed as
self[i] - rhs[i].
__mul__
__mul__(self, rhs: Self) -> Self
Computes self * rhs.
Args:
- rhs (
Self): The rhs value.
Returns:
Self: A new vector whose element at position i is computed as
self[i] * rhs[i].
__truediv__
__truediv__(self, rhs: Self) -> Self
Computes self / rhs.
Args:
- rhs (
Self): The rhs value.
Returns:
Self: A new vector whose element at position i is computed as
self[i] / rhs[i].
__floordiv__
__floordiv__(self, rhs: Self) -> Self
Returns the division of self and rhs rounded down to the nearest integer.
Constraints:
The element type of the SIMD vector must be numeric.
Args:
- rhs (
Self): The value to divide with.
Returns:
Self: floor(self / rhs) value.
__mod__
__mod__(self, rhs: Self) -> Self
Returns the remainder of self divided by rhs.
Args:
- rhs (
Self): The value to divide with.
Returns:
Self: The remainder of dividing self by rhs.