Mojo nightly
This version is still a work in progress.
Language enhancements
-
Types can parameterize the
outargument modifier when they want into being bindable to alternate address spaces, e.g.:struct MemType(Movable):# Can be constructed into any address space.def __init__[addr_space: AddressSpace](out[addr_space] self):...# Only constructable into GLOBAL address space.def __init__(arg: Int, out[AddressSpace.GLOBAL] self):... -
Mojo now supports building types that support implicit conversions for widening origins, allowing code like this to "just work" without rebind:
def origin_superset_conversion(a: String, b: String, c: Bool) -> Pointer[String, origin_of(a, b)]:if c: # These pointers implicitly convert.return Pointer(to=a)else:return Pointer(to=b)
Language changes
-
Support for "set-only" accessors has been removed. You need to define a
__getitem__or__getattr__to use a type that defines the corresponding setter. This eliminates a class of bugs determining the effective element type. -
Implicit
stdimports are now an error, following a period of deprecation. Imports from the standard library must now be fully qualified. The compiler thus no longer squats on these module names, paving the way for user modules namedalgorithm,memory, etc.
Library changes
-
Coord,coord(),Idx,ComptimeInt,RuntimeInt, and related coordinate helpers now live in the standard library modulestd.utils.coord. Thelayout.coordmodule re-exports the same symbols for layout and kernel code;layoutalso hoists the common names at package scope for convenience. -
Python -> Mojo FFI calls registered through
PythonModuleBuilderandPythonTypeBuilderhave significantly reduced per-call overhead:-
Non-kwargs callables registered with
def_function/def_method/def_staticmethodnow use CPython'sMETH_FASTCALLcalling convention rather thanMETH_VARARGS. Kwargs-accepting functions still useMETH_VARARGS | METH_KEYWORDS. -
PythonObject.__del__skips thePyGILState_Ensure/PyGILState_Releaseround-trip when the current thread already holds the GIL (checked viaPyGILState_Check). On the common Python -> Mojo FFI path (where CPython hands the callee an already-held GIL) the destructor pays just the check and a directPy_DecRef. The public contract is unchanged - dropping aPythonObjectfrom a thread that does not hold the GIL remains safe. -
Int(py=obj)andScalar[IntDType](py=obj)fast-path exact PythonintviaPyLong_AsSsize_t.
-
-
Added
TileTensor.copy_from()andTileTensor.split()for copying between compatible tile views and splitting tiles into static or runtime-sized partitions. -
String.as_bytes_mut()has been renamed toString.unsafe_as_bytes_mut(), to reflect that writing invalid UTF-8 to the resultingSpan[Byte]can lead to later issues like out of bounds access. -
List[T]no longer requires its type to beCopyable, but now works withMovable-only types. Iteration still requiresCopyableand will emit acomptime assertif not satisfied. -
reflect[T]is now acomptimealias for theReflected[T]handle type rather than a function returning a zero-sized handle instance. All methods onReflected[T]are@staticmethods, and the type is no longer constructible. Drop the parens at call sites:# Beforecomptime r = reflect[Point]()print(r.field_count())print(reflect[Point]().name())comptime y_handle = reflect[Point]().field_type["y"]()var v: y_handle.T = 3.14# Aftercomptime r = reflect[Point]print(r.field_count())print(reflect[Point].name())comptime y_handle = reflect[Point].field_type["y"]var v: y_handle.T = 3.14field_type[name]is now a parametriccomptimemember alias that yieldsReflected[FieldT]directly — no trailing(), and the result is fully composable (e.g.reflect[T].field_type["x"].name()). The previously deprecated free functionsget_type_name,get_base_type_name, and thestruct_field_*family (along with theReflectedType[T]wrapper) have been removed; use the corresponding methods onreflect[T]:Removed Replacement get_type_name[T]()reflect[T].name()get_base_type_name[T]()reflect[T].base_name()is_struct_type[T]()reflect[T].is_struct()struct_field_count[T]()reflect[T].field_count()struct_field_names[T]()reflect[T].field_names()struct_field_types[T]()reflect[T].field_types()struct_field_index_by_name[T, name]()reflect[T].field_index[name]()struct_field_type_by_name[T, name]()reflect[T].field_type[name]struct_field_ref[idx, T](s)reflect[T].field_ref[idx](s)offset_of[T, name=name]()reflect[T].field_offset[name=name]()offset_of[T, index=index]()reflect[T].field_offset[index=index]()ReflectedType[T]Reflected[T] -
Added
ReflectedFn[func], a function-side reflection handle accessed via thereflect_fn[func]comptimealias. Exposes function introspection through static methods, paralleling the type-sideReflected[T]API:from std.reflection import reflect_fndef my_func(x: Int) -> Int:return x + 1def main():print(reflect_fn[my_func].display_name()) # "my_func"print(reflect_fn[my_func].linkage_name()) # mangled symbol name -
Added
alloc,free, andLayoutinmemory.allocfor layout-aware memory allocation. ALayout[T]bundles an element count and alignment into a single value that is passed to bothallocandfree, keeping size and alignment requirements explicit and co-located at every call site.from memory import alloc, free, Layoutvar layout = Layout[Int32](count=4)var ptr = alloc(layout)# ... initialize & use ptr ...free(ptr, layout) -
The default
seedforrandom.Random,random.NormalRandom, and the internal_PhiloxWrapperhas changed from0to0x3D30F19CD101(67280421310721) to match PyTorch'sat::Philox4_32_10default. Calls that omitted theseedargument will now produce a different output stream; passseed=0explicitly to keep the previous behavior. -
Added
nth()as a default method on theIteratortrait. It advances the iterator bynelements (destroying them) and returns the next element, orNoneif the iterator runs out before reaching indexn.var l = [10, 20, 30, 40]print(iter(l).nth(0).value()) # 10print(iter(l).nth(3).value()) # 40var missing = iter(l).nth(10) # None (Optional) -
StringandStringSlicenow have a keyword onlystring[codepoint=...]that indexes by unicode codepoint offsets. -
Several
StringSliceconstructors are now deprecated.StringSlice(ptr=..., length=...)is deprecated; useStringSlice(unsafe_from_utf8=Span(...))instead.StringSlice(unsafe_from_utf8_ptr=...)(taking a raw nul-terminatedUnsafePointer[Byte]orUnsafePointer[c_char]) is deprecated; construct aCStringSlicefrom the pointer and use the newStringSlice(unsafe_from_utf8=CStringSlice(...))constructor instead.
-
PythonObject convertibility got simplified and cleaned up. When working with types that required custom conversions to
PythonObject, we used to write code like this:struct MyCustomType(ConvertibleToPython, ImplicitlyCopyable):def to_python_object(var self) raises -> PythonObject:return PythonObject( ... custom logic ...)def hi_python(a: Some[ImplicitlyCopyable & ConvertibleToPython]) raises:print(t"Hi, {a.to_python_object()}!")def example():hi_python(MyCustomType())This approach allows custom types to implement
ConvertibleToPythonto get a domain specific encoding as a Python object. Mojo has simplified this by making allConvertibleToPythontypes implicitly convert toPythonObject, so this can/should be simplified to:def hi_python(a: PythonObject) raises:print(t"Hi, {a}!") -
The
Intabletrait no longer inherits fromImplicitlyDestructible. Generic code that relied on receiving the destructor bound transitively through this trait must now spell it out explicitly, for exampleT: Intable & ImplicitlyDestructible.
Tooling changes
-
The
mojocompiler will now print the filename and line number in diagnostics that point to inaccessible source locations (e.g., from precompiled libraries) instead of a location at the top of the main file:# Before$> mojo example.mojo/path/to/example.mojo:33:16: error: invalid call to '__setitem__': violated constraintvec[base + i] = values[i].cast[dtype]()~~~^~~~~~~~~~/path/to/example.mojo:1:1: note: constraint declared here evaluated to False, expected 'mut'from std.algorithm.functional import elementwise^/path/to/example.mojo:1:1: note: function declared herefrom std.algorithm.functional import elementwise^# After$> mojo example.mojo/path/to/example.mojo:33:16: error: invalid call to '__setitem__': violated constraintvec[base + i] = values[i].cast[dtype]()~~~^~~~~~~~~~max/kernels/src/layout/layout_tensor.mojo:2092: note: constraint declared here evaluated to False, expected 'mut'max/kernels/src/layout/layout_tensor.mojo:2090: note: function declared here -
The
mojo packagecommand has renamed tomojo precompile. Similarly, the.mojopkgfile extension has been deprecated; favor the.mojocfile extension instead.# Beforemojo package my_package -o my_package.mojopkg# Aftermojo precompile my_package -o my_package.mojoc
GPU programming
-
Added
DeviceContextList[size]instd.gpu.host: a fixed-size,Copyable/ImplicitlyCopyable/Sizedcollection ofDeviceContextvalues. Multi-device custom-opexecutemethods now receive aDeviceContextList[N]— the graph compiler synthesizes one from the per-device contexts attached to the op via a variadic constructor. Kernels can index into it withdev_ctxs[i](runtime) ordev_ctxs.__getitem_param__[i]()(comptime), and iterate withlen(). This replaces the previousDeviceContextPtrListpattern.from gpu.host import DeviceContext, DeviceContextList@compiler.register("mo.distributed.allreduce.sum")struct DistributedAllReduceSum:@staticmethoddef execute[dtype: DType, rank: Int, target: StaticString, _trace_name: StaticString,](outputs: FusedOutputVariadicTensors[dtype=dtype, rank=rank, ...],inputs: InputVariadicTensors[dtype=dtype, rank=rank, ...],signal_buffers: MutableInputVariadicTensors[dtype=DType.uint8, rank=1, ...],dev_ctxs: DeviceContextList,) capturing raises:comptime num_devices = inputs.size# ... use dev_ctxs[i] per device ... -
DeviceContext.enqueue_function[func]andDeviceContext.compile_function[func]now accept a single kernel argument instead of requiring it to be passed twice. The previous two-argument formsenqueue_function[func, func]andcompile_function[func, func]are deprecated. The transitionalenqueue_function_experimentalandcompile_function_experimentalaliases are also deprecated; switch toenqueue_function/compile_function.# Beforectx.enqueue_function[my_kernel, my_kernel](grid_dim=1, block_dim=1)ctx.enqueue_function_experimental[my_kernel](grid_dim=1, block_dim=1)# Afterctx.enqueue_function[my_kernel](grid_dim=1, block_dim=1)
Removed
-
The
DeviceContextPtrandDeviceContextPtrListtypes have been removed fromstd.runtime.asyncrt. Custom-opexecutemethods now takeDeviceContextdirectly (orOptional[DeviceContext]where the context is genuinely optional), and multi-device ops takeDeviceContextList[N](see the new entry under Library changes). The helpersget_device_context()andget_optional_device_context()are no longer needed — pass theDeviceContextthrough directly. TheCpuDeviceContextruntime always supplies a real context for the CPU path, so the nullable wrapper is no longer required.# Beforefrom runtime.asyncrt import DeviceContextPtr, DeviceContextPtrList@compiler.register("my_op")struct MyOp:@staticmethoddef execute[target: StaticString](output: OutputTensor,input: InputTensor,ctx: DeviceContextPtr,) raises:var gpu_ctx = ctx.get_device_context()...# Afterfrom gpu.host import DeviceContext@compiler.register("my_op")struct MyOp:@staticmethoddef execute[target: StaticString](output: OutputTensor,input: InputTensor,ctx: DeviceContext,) raises:... -
The
use_blocking_implparameter has been removed fromelementwise(instd.algorithm.functional), and the analogoussingle_thread_blocking_overrideparameter has been removed from the reduction APIs (reduce,max,min,sum,product,meaninstd.algorithm.reduction). These operations now always dispatch work the same way, with a single worker used automatically when the problem size is small, so the blocking variants are no longer needed. -
The legacy
fnkeyword now produces an error instead of a warning. Please move todef. -
The previously-deprecated
constrained[cond, msg]()function has been removed. Usecomptime assert cond, msginstead. -
The previously-deprecated
Int-returning overload ofnormalize_indexhas been removed. Use theUInt-returning overload (or write the index arithmetic inline, e.g.x[len(x) - 1]). -
The previously-deprecated default
UnsafePointer()null constructor has been removed. To model a nullable pointer useOptional[UnsafePointer[...]]. For a non-null placeholder for delayed initialization, useUnsafePointer.unsafe_dangling(). -
The deprecated free-function reflection API in
std.reflectionhas been removed. Use the unifiedreflect[T]() -> Reflected[T]API instead.Migration table:
struct_field_count[T]()→reflect[T]().field_count()struct_field_names[T]()→reflect[T]().field_names()struct_field_types[T]()→reflect[T]().field_types()struct_field_index_by_name[T, name]()→reflect[T]().field_index[name]()struct_field_type_by_name[T, name]()→reflect[T]().field_type[name]()struct_field_ref[idx](s)→reflect[T]().field_ref[idx](s)is_struct_type[T]()→reflect[T]().is_struct()offset_of[T, name=...]()→reflect[T]().field_offset[name=...]()offset_of[T, index=...]()→reflect[T]().field_offset[index=...]()ReflectedType[T]→Reflected[T]
-
StringandStringSlicecan now be sliced by codepoints, e.g.String("🔄🔥🔄")[codepoint=1:2]returns"🔥". -
StringandStringSlicecan now be indexed by graphemes, e.g.String("👨🚀🧑🌾क्षि")[grapheme=1]returns"🧑🌾".
Fixed
-
Reduced the virtual address space reserved by every
mojoinvocation by ~1 GiB. The JIT memory mapper's reservation granularity was 1 GiB, so each fresh reservation was rounded up to that size and mmappedPROT_READ|PROT_WRITE, inflatingVmPeakand counting against LinuxRLIMIT_AS. This caused non-deterministic OOM crashes inlibKGENCompilerRTShared.sowhen twomojoprocesses ran concurrently on memory-constrained CI runners (e.g. GitHub Actions free-tier, 7 GiB). The granularity is now 64 MiB; large compiles still work because the mapper reserves additional slabs on demand. (Issue #6433) -
Attempting to import a source Mojo package from a broken symlink will no longer result in a compiler crash. (Issue #6424)