v25.2 (2025-03-25)
✨ Highlights
-
Check out the new GPU basics section of the Mojo Manual and the Get started with GPU programming with Mojo and the MAX Driver tutorial for a guide to getting started with GPU programming in Mojo!
-
Some APIs in the
gpupackage were enhanced to simplify working with GPUs.-
If you're executing a GPU kernel only once, you can now skip compiling it first before enqueueing it, and pass it directly to
DeviceContext.enqueue_function(). -
The three separate methods on
DeviceContextfor asynchronously copying buffers between host and GPU memory have been combined to single overloadedenqueue_copy()method, and the three separate methods for synchronous copies have been combined into an overloadedcopy_sync()method. -
The
gpu.shufflemodule has been renamed togpu.warpto better reflect its purpose. -
The
gpupackage API documentation has been expanded, and API documentation for thelayoutpackage is underway, beginning with core types, functions, and traits.
See the Standard library changes section of the changelog for more information.
-
-
The legacy
borrowed/inoutkeywords and-> T as foosyntax are no longer supported and now generate a compiler error. Please move toread/mut/outargument syntax instead. See Argument conventions in the Mojo Manual for more information. -
The standard library has many changes related to strings. Notably, the
Chartype has been renamed toCodepoint, to better capture its intended purpose of storing a single Unicode codepoint. Additionally, related method and type names have been updated as well. See Standard library changes for more details. -
Support has been added for 128- and 256-bit signed and unsigned integers. This includes the
DTypealiasesDType.int128,DType.uint128,DType.int256, andDType.uint256, as well asSIMDsupport for 128- and 256-bit signed and unsigned element types. Note that this exposes capabilities (and limitations) of LLVM, which may not always provide high performance for these types and may have missing operations like divide, remainder, etc. See Standard library changes for more details.
Language changes
-
References to aliases in struct types with unbound (or partially) bound parameters sets are now allowed as long as the referenced alias doesn't depend on any unbound parameters:
struct StructWithParam[a: Int, b: Int]:
alias a1 = 42
alias a2 = a+1
fn test():
_ = StructWithParams.a1 # ok
_ = StructWithParams[1].a2 # ok
_ = StructWithParams.a2 # error, 'a' is unbound. -
The Mojo compiler now warns about
@parameter forwith large loop unrolling factor (>1024 by default), which can lead to long compilation time and large generated code size. Set--loop-unrolling-warn-thresholdto change default value to a different threshold or to0to disable the warning. -
The Mojo compile-time interpreter can now handle many more LLVM intrinsics, including ones that return floating point values. This allows functions like
round()to be constant folded when used in a compile-time context. -
The Mojo compiler now has only one compile-time interpreter. It had two previously: one to handle a few cases that were important for dependent types in the parser (but which also had many limitations), and the primary one that ran at "instantiation" time which is fully general. This was confusing and caused a wide range of bugs. We've now removed the special case parse-time interpreter, replacing it with a more general solution for dependent types. This change should be invisible to most users, but should resolve a number of long-standing bugs and significantly simplifies the compiler implementation, allowing us to move faster.
Standard library changes
-
Optional,Span, andInlineArrayhave been added to the prelude. You now no longer need to explicitly import these types to use them in your program. -
GPU programming changes:
-
You can now skip compiling a GPU kernel first before enqueueing it, and pass it directly to
DeviceContext.enqueue_function():from gpu.host import DeviceContext
fn func():
print("Hello from GPU")
with DeviceContext() as ctx:
ctx.enqueue_function[func](grid_dim=1, block_dim=1)However, if you're reusing the same function and parameters multiple times, this incurs some overhead of around 50-500 nanoseconds per enqueue. So you can still compile the function first with
DeviceContext.compile_function()and pass it toDeviceContext.enqueue_function()like this:with DeviceContext() as ctx:
var compiled_func = ctx.compile_function[func]()
# Multiple kernel launches with the same function/parameters
ctx.enqueue_function(compiled_func, grid_dim=1, block_dim=1)
ctx.enqueue_function(compiled_func, grid_dim=1, block_dim=1) -
The following methods on
DeviceContext:enqueue_copy_to_device()enqueue_copy_from_device()enqueue_copy_device_to_device()
have been combined to a single overloaded
enqueue_copy()method. Additionally, the methods:copy_to_device_sync()copy_from_device_sync()copy_device_to_device_sync()
have been combined into an overloaded
copy_sync()method. -
The
gpu.shufflemodule has been renamed togpu.warpto better reflect its purpose. For example:import gpu.warp as warp
var val0 = warp.shuffle_down(x, offset)
var val1 = warp.broadcast(x)
-
-
Support has been added for 128- and 256-bit signed and unsigned integers.
-
The following aliases have been added to the
DTypestruct:DType.int128,DType.uint128,DType.int256, andDType.uint256. -
The
SIMDtype now supports 128- and 256-bit signed and unsigned element types. Note that this exposes capabilities (and limitations) of LLVM, which may not always provide high performance for these types and may have missing operations like divide, remainder, etc. -
The following
Scalaraliases for 1-elementSIMDvalues have been added:Int128,UInt128,Int256, andUInt256.
-
-
Stringand friends:-
The
Chartype has been renamed toCodepoint, to better capture its intended purpose of storing a single Unicode codepoint. Additionally, related method and type names have been updated as well, including:-
StringSlice.chars()andString.chars()toStringSlice.codepoints()andString.codepoints(), respectively -
StringSlice.char_slices()andString.char_slices()toStringSlice.codepoint_slices()andString.codepoint_slices(), respectively -
CharsItertoCodepointsIter -
Char.unsafe_decode_utf8_char()toCodepoint.unsafe_decode_utf8_codepoint() -
Made the iterator type returned by the string
codepoint_slices()methods public asCodepointSliceIter.
-
-
StringSlicenow supports several additional methods moved fromString. The existingStringmethods have been updated to instead call the corresponding newStringSlicemethods: -
Added a
StringSlice.is_codepoint_boundary()method for querying if a given byte index is a boundary between encoded UTF-8 codepoints. -
StringSlice.__getitem__(Slice)now raises an error if the provided slice start and end positions do not fall on a valid codepoint boundary. This prevents construction of malformedStringSlicevalues, which could lead to memory unsafety or undefined behavior. For example, given a string containing multi-byte encoded data, like:str_slice = "Hi👋!"and whose in-memory and decoded data looks like:
String Hi👋! Codepoint Characters H i 👋 ! Codepoints 72 105 128075 33 Bytes 72 105 240 159 145 139 33 Index 0 1 2 3 4 5 6 attempting to slice bytes
[3-5)withstr_slice[3:5]would previously erroneously produce a malformedStringSliceas output that did not correctly decode to anything:String invalid Codepoint Characters invalid Codepoints invalid Bytes 159 145 Index 0 1 The same statement will now raise an error informing the user that their indices are invalid.
-
The
StringLiteral.get[value]()method, which converts a compile-time value ofStringabletype has been changed to a function namedget_string_literal[value]().
-
-
Collections:
-
A new
IntervalTreedata structure has been added to the standard library. This is a tree data structure that allows for efficient range queries. -
Added an iterator to
LinkedList(PR #4005)-
LinkedList.__iter__()to create a forward iterator. -
LinkedList.__reversed__()for a backward iterator.
var ll = LinkedList[Int](1, 2, 3)
for element in ll:
print(element[]) -
-
List.bytecount()has been renamed toList.byte_length()for consistency with the string-like APIs. -
The
InlineArray(unsafe_uninitialized=True)constructor is now spelledInlineArray(uninitialized=True).
-
-
The design of the
IntLiteralandFloatLiteraltypes has been changed to maintain their compile-time-only value as a parameter instead of a stored field. This correctly models that infinite precision literals are not representable at runtime, and eliminates a number of bugs hit in corner cases. This is made possible by enhanced dependent type support in the compiler. -
The
Bufferstruct has been removed in favor ofSpanandNDBuffer. -
The
round()function is now fixed to perform "round half to even" (also known as "bankers' rounding") instead of "round half away from zero". -
The
UnsafePointer.alloc()method has changed to produce pointers with an emptyOriginparameter, instead of withMutableAnyOrigin. This mitigates an issue with the any origin parameter extending the lifetime of unrelated local variables for this common method. -
Several more packages are now documented:
-
Added a new
sys.is_compile_time()function. This enables you to query whether code is being executed at compile time or not. For example:from sys import is_compile_time
fn check_compile_time() -> String:
if is_compile_time():
return "compile time"
else:
return "runtime"
def main():
alias var0 = check_compile_time()
var var1 = check_compile_time()
print("var0 is evaluated at ", var0, " , while var1 is evaluated at ", var1)will print
var0 is evaluated at compile time, while var1 is evaluated at runtime.
Tooling changes
-
Mojo API documentation generation is now able to display function and struct parameter references inside nested parametric types using names instead of indices. For example, instead of
sort[type: CollectionElement, //, cmp_fn: fn($1|0, $1|0) capturing -> Bool](span: Span[type, origin])it now displays
sort[type: CollectionElement, //, cmp_fn: fn(type, type) capturing -> Bool](span: Span[type, origin])
❌ Removed
-
Use of legacy argument conventions like
inoutand the use ofasin named results now produces an error message instead of a warning. -
Direct access to
List.sizehas been removed. Use the public API instead.Examples:
Extending a List:
base_data = List[Byte](1, 2, 3)
data_list = List[Byte](4, 5, 6)
ext_data_list = base_data.copy()
ext_data_list.extend(data_list) # [1, 2, 3, 4, 5, 6]
data_span = Span(List[Byte](4, 5, 6))
ext_data_span = base_data.copy()
ext_data_span.extend(data_span) # [1, 2, 3, 4, 5, 6]
data_vec = SIMD[DType.uint8, 4](4, 5, 6, 7)
ext_data_vec_full = base_data.copy()
ext_data_vec_full.extend(data_vec) # [1, 2, 3, 4, 5, 6, 7]
ext_data_vec_partial = base_data.copy()
ext_data_vec_partial.extend(data_vec, count=3) # [1, 2, 3, 4, 5, 6]Slicing and extending a list efficiently:
base_data = List[Byte](1, 2, 3, 4, 5, 6)
n4_n5 = Span(base_data)[3:5]
extra_data = Span(List[Byte](8, 10))
end_result = List[Byte](capacity=len(n4_n5) + len(extra_data))
end_result.extend(n4_n5)
end_result.extend(extra_data) # [4, 5, 8, 10] -
InlinedFixedVectorandInlineListhave been removed. Instead, useInlineArraywhen the upper bound is known at compile time. If the upper bound is not known until runtime, useListwith thecapacityconstructor to minimize allocations.
🛠️ Fixed
- #3976 The
varianceargument inrandom.randn_float64()andrandom.randn()has been renamed tostandard_deviationso that values are drawn from the correct distribution.
Special thanks
Special thanks to our community contributors: @bgreni, @fnands, @illiasheshyn, @izo0x90, @lydiandy, @martinvuyk, @msaelices, @owenhilyard, @rd4com, @yinonburgansky