Mojo nightly
This version is still a work in progress.
Language enhancements
-
Added an
@unavailabledecorator that marks a function or method as intentionally unavailable. Unlike@deprecated(which emits a warning), referencing an@unavailabledeclaration is an error. Like@deprecated, it accepts either a reason message (positional or asreason=) or ause=symbolreplacement. Whenuse=symbolis given, the error includes a fix-it that renames the call tosymbol.struct Foo:@unavailable("message here...")def foo(self) -> Int:...@unavailable(use=new_api)def old_api():...def new_api():pass -
Types can parameterize the
outargument modifier when they want to be bindable to alternate address spaces, e.g.:struct MemType(Movable):# Can be constructed into any address space.def __init__[addr_space: AddressSpace](out[addr_space] self):...# Only constructable into GLOBAL address space.def __init__(arg: Int, out[AddressSpace.GLOBAL] self):... -
Mojo now supports building types that support implicit conversions for widening origins, allowing code like this to "just work" without rebind:
def origin_superset_conversion(a: String, b: String, c: Bool) -> Pointer[String, origin_of(a, b)]:if c: # These pointers implicitly convert.return Pointer(to=a)else:return Pointer(to=b) -
Types may now be conditionally "ImplicitlyDestructible" with a where clause:
@explicit_destroy("Message when implicitly destroyed")struct ConditionallyLinearType[T: AnyType](ImplicitlyDestructible where conforms_to(T, ImplicitlyDestructible)):var data: Self.T -
Trailing
whereclauses are now supported on struct declarations. Constraints are part of the type and checked at every binding site:struct SIMD[dtype: DType, size: Int]where dtype != DType.invalidwhere size.is_power_of_two():... -
Trailing
whereclauses are now supported oncomptimealias declarations:comptime PositiveOnly[N: Int]: AnyType where N > 0 = ... -
Trailing
whereclauses can now discharge constraints from constrained types appearing anywhere in a signature. A single trailingwherewill simultaneously constrain the declaration and satisfy the requirements of types used within the same signature:struct Matrix[m: Int, n: Int] where m > 0 where n > 0: ...def solve_linear_system[n: Int, a: Matrix[n, n], b: Vector[n]]() -> Vector[n]where n > 0:...If no trailing
wheredischarges a constraint, the compiler reports an error and suggests the missing clause.
Language changes
-
Support for "set-only" accessors has been removed. You need to define a
__getitem__or__getattr__to use a type that defines the corresponding setter. This eliminates a class of bugs determining the effective element type. -
Implicit
stdimports are now an error, following a period of deprecation. Imports from the standard library must now be fully qualified. The compiler thus no longer squats on these module names, paving the way for user modules namedalgorithm,memory, etc. -
Specifying
ABI="C"in an@exportdecorator is now deprecated;abi("C")should be used instead.@export("old", ABI="C")def old(): pass@export("new")def new() abi("C"): pass -
Functions marked
@exportmust now be given an explicitabieffect, rather than relying implicitly on the default (equivalent toabi("Mojo")). The compiler will produce a warning on missingabieffects, which will become an error in a future release. -
A bug preventing
from . import modulewith a spurious recursive-reference error has been fixed.
Library changes
-
The
axisof thenn.splitkernel is now a keyword-only compile-time parameter instead of a runtime argument. Pass it in the parameter list, e.g.split[..., axis=axis](input, outputs, ctx). -
UnsafePointerimplicit constructor has been fixed. When a function took anUnsafePointer[mut=False, ...], and was passed a mutable pointer, the incorrect constructor was chosen from overload resolution resulting in the new origin beingImmutableAnyOrigin. This is an issue as it occasionally hid mutability aliasing between pointers and hid some unused variables. The constructor now correctly casts toImmutOrigin(Self.origin). -
The reduction axis of the
std.algorithmreductions (sum,product,mean,max,min, and the underlying_reduce_generatorplus the CPU and GPU backends) is now a keyword-only compile-time parameter namedreduce_diminstead of a runtime argument. This covers themo.reduce.{mean,add,mul,max,min,reduce_min_and_max}ops. Pass it in the parameter list, e.g.sum[..., reduce_dim=axis](shape, ctx). -
The
axisof thenn.cumsumkernel is now a keyword-only compile-time parameter instead of a runtime argument. Pass it in the parameter list, e.g.cumsum[dtype, exclusive, reverse, axis=axis](output, input). -
The
ImplicitlyCopyable,Intable, andEquatabletraits no longer inherit fromImplicitlyDestructible. Generic code that relied on receiving the destructor bound transitively through these traits (or throughComparable, which inherits fromEquatable) must now spell it out explicitly, for exampleT: ImplicitlyCopyable & ImplicitlyDestructible. In practice, most generic code should preferT: Copyableinstead, per the guidance inImplicitlyCopyable's docstring. -
Changed
Idxto acomptimealias forComptimeInt. UseIdx[value]instead ofIdx[value]()for compile-time coordinates. -
Added
is_trivially_movable,is_trivially_copyable, andis_trivially_destructibletostd.memory. These helper functions return whether a type's move constructor, copy constructor, or destructor is trivial (i.e., a bit-copy or a no-op). -
Added
std.gpu.host.CompletionFlag, a non-owning handle to an MLRTM::Driver::CompletionFlag(an 8-byte slot in pinned host memory mapped into a device's address space). Pairs with the newDeviceStream.wait_for_host_value(flag, value)method, which stalls the stream until the flag's 64-bit slot equals the given value. Corresponds to CUDA'scuStreamWaitValue64and captures cleanly into a CUDA graph as a wait-value node, letting a CPU thread (or an AsyncRT worker dispatched byenqueue_host_func) gate a GPU stream on host-produced data without a second stream or a blocking host-function callback. Currently CUDA-only; other backends raise. -
Coord,coord(),Idx,ComptimeInt,RuntimeInt, and related coordinate helpers now live in the standard library modulestd.utils.coord. Thelayout.coordmodule re-exports the same symbols for layout and kernel code;layoutalso hoists the common names at package scope for convenience. -
Python -> Mojo FFI calls registered through
PythonModuleBuilderandPythonTypeBuilderhave significantly reduced per-call overhead:-
Non-kwargs callables registered with
def_function/def_method/def_staticmethodnow use CPython'sMETH_FASTCALLcalling convention rather thanMETH_VARARGS. Kwargs-accepting functions still useMETH_VARARGS | METH_KEYWORDS. -
PythonObject.__del__skips thePyGILState_Ensure/PyGILState_Releaseround-trip when the current thread already holds the GIL (checked viaPyGILState_Check). On the common Python -> Mojo FFI path (where CPython hands the callee an already-held GIL) the destructor pays just the check and a directPy_DecRef. The public contract is unchanged - dropping aPythonObjectfrom a thread that does not hold the GIL remains safe. -
Int(py=obj)andScalar[IntDType](py=obj)fast-path exact PythonintviaPyLong_AsSsize_t.
-
-
Added
TileTensor.copy_from()andTileTensor.split()for copying between compatible tile views and splitting tiles into static or runtime-sized partitions. -
String.as_bytes_mut()has been renamed toString.unsafe_as_bytes_mut(), to reflect that writing invalid UTF-8 to the resultingSpan[Byte]can lead to later issues like out of bounds access. -
A new
BinaryHeapcollection has been added to thestd.collectionsmodule. This is a list-backed binary max-heap. -
The core collection types no longer require their element type to be
Copyable—Movable & ImplicitlyDestructibleis now the minimum bound. This applies toList[T],Deque[T],LinkedList[T],InlineArray[T, size], both type parameters ofDict[K, V, H](along withSwissTable/SwissTableEntry/OwnedKwargsDictand the loosenedKeyElementtrait), andSet[T]. Copy-requiring methods stay gated onCopyable.Counter[V]is unchanged.Dict.setdefaultandSet.addnow take their argument byvar T; for move-only types call them asd.setdefault(key^, default)orset.add(value^). -
reflect[T]is now acomptimealias for theReflected[T]handle type rather than a function returning a zero-sized handle instance. All methods onReflected[T]are@staticmethods, and the type is no longer constructible. Drop the parens at call sites:# Beforecomptime r = reflect[Point]()print(r.field_count())print(reflect[Point]().name())comptime y_handle = reflect[Point]().field_type["y"]()var v: y_handle.T = 3.14# Aftercomptime r = reflect[Point]print(r.field_count())print(reflect[Point].name())comptime y_handle = reflect[Point].field_type["y"]var v: y_handle.T = 3.14field_type[name]is now a parametriccomptimemember alias that yieldsReflected[FieldT]directly — no trailing(), and the result is fully composable (e.g.reflect[T].field_type["x"].name()). The previously deprecated free functionsget_type_name,get_base_type_name, and thestruct_field_*family (along with theReflectedType[T]wrapper) have been removed; use the corresponding methods onreflect[T]:Removed Replacement get_type_name[T]()reflect[T].name()get_base_type_name[T]()reflect[T].base_name()is_struct_type[T]()reflect[T].is_struct()struct_field_count[T]()reflect[T].field_count()struct_field_names[T]()reflect[T].field_names()struct_field_types[T]()reflect[T].field_types()struct_field_index_by_name[T, name]()reflect[T].field_index[name]()struct_field_type_by_name[T, name]()reflect[T].field_type[name]struct_field_ref[idx, T](s)reflect[T].field_ref[idx](s)offset_of[T, name=name]()reflect[T].field_offset[name=name]()offset_of[T, index=index]()reflect[T].field_offset[index=index]()ReflectedType[T]Reflected[T] -
Added
ReflectedFn[func], a function-side reflection handle accessed via thereflect_fn[func]comptimealias. Exposes function introspection through static methods, paralleling the type-sideReflected[T]API:from std.reflection import reflect_fndef my_func(x: Int) -> Int:return x + 1def main():print(reflect_fn[my_func].display_name()) # "my_func"print(reflect_fn[my_func].linkage_name()) # mangled symbol name -
Added
alloc,free, andLayoutinmemory.allocfor layout-aware memory allocation. ALayout[T]bundles an element count and alignment into a single value that is passed to bothallocandfree, keeping size and alignment requirements explicit and co-located at every call site.from memory import alloc, free, Layoutvar layout = Layout[Int32](count=4)var ptr = alloc(layout)# ... initialize & use ptr ...free(ptr, layout) -
The default
seedforrandom.Random,random.NormalRandom, and the internal_PhiloxWrapperhas changed from0to0x3D30F19CD101(67280421310721) to match PyTorch'sat::Philox4_32_10default. Calls that omitted theseedargument will now produce a different output stream; passseed=0explicitly to keep the previous behavior. -
Added
nth()as a default method on theIteratortrait. It advances the iterator bynelements (destroying them) and returns the next element, orNoneif the iterator runs out before reaching indexn.var l = [10, 20, 30, 40]print(iter(l).nth(0).value()) # 10print(iter(l).nth(3).value()) # 40var missing = iter(l).nth(10) # None (Optional) -
StringandStringSlicenow expose abytes()method that returns a newBytesIter, an iterator over the raw UTF-8 bytes of the string. This complements the existingcodepoints()andgraphemes()iterators by operating at the byte level without interpreting multi-byte UTF-8 sequences.var s = StringSlice("é") # Encoded in UTF-8 as 0xC3 0xA9.for b in s.bytes():print(b) # 195, 169 -
StringandStringSlicenow have a keyword onlystring[codepoint=...]that indexes by unicode codepoint offsets. -
Several
StringSliceconstructors are now deprecated.StringSlice(ptr=..., length=...)is deprecated; useStringSlice(unsafe_from_utf8=Span(...))instead.StringSlice(unsafe_from_utf8_ptr=...)(taking a raw nul-terminatedUnsafePointer[Byte]orUnsafePointer[c_char]) is deprecated; construct aCStringSlicefrom the pointer and use the newStringSlice(unsafe_from_utf8=CStringSlice(...))constructor instead.
-
PythonObject convertibility got simplified and cleaned up. When working with types that required custom conversions to
PythonObject, we used to write code like this:struct MyCustomType(ConvertibleToPython, ImplicitlyCopyable):def to_python_object(var self) raises -> PythonObject:return PythonObject( ... custom logic ...)def hi_python(a: Some[ImplicitlyCopyable & ConvertibleToPython]) raises:print(t"Hi, {a.to_python_object()}!")def example():hi_python(MyCustomType())This approach allows custom types to implement
ConvertibleToPythonto get a domain specific encoding as a Python object. Mojo has simplified this by making allConvertibleToPythontypes implicitly convert toPythonObject, so this can/should be simplified to:def hi_python(a: PythonObject) raises:print(t"Hi, {a}!") -
The CPython FFI bindings now carry the
abi("C")effect. User-written Python extension callbacks passed todef_py_c_function,def_py_c_method, orPyCapsule_Newmust addabi("C")to their signatures, e.g.def my_func(self: PyObjectPtr, args: PyObjectPtr) abi("C") -> PyObjectPtr:. Functions registered through the higher-leveldef_function,def_method, anddef_staticmethodpaths are unaffected. -
Added
take()anddrop()iterator adapters tostd.itertools.take(iter, n)yields the firstnelements, anddrop(iter, n)drops the firstnelements. They compose naturally to select sub-ranges of any iterable:from std.itertools import take, dropvar nums = [1, 2, 3, 4, 5]for x in take(drop(nums, 1), 3):print(x) # 2, 3, 4 -
The
Indexertrait no longer inherits fromImplicitlyDestructible. Generic code that relied on receiving the destructor bound transitively through this trait must now spell it out explicitly, for exampleT: Indexer & ImplicitlyDestructible. -
Added the
UntrackedOriginandUnsafeAnyOriginorigin aliases (and theirMut/Immutvariants) as the new names forExternalOriginandAnyOrigin, respectively.UntrackedOriginis the empty origin: it aliases nothing, so the lifetime checker has nothing to track, and it remains a supported tool for interfacing with memory from outside the Mojo program.UnsafeAnyOriginis the universal origin: it might alias anything, defeating lifetime extension and exclusivity checking, so itsUnsafeprefix marks it as an escape hatch slated for deprecation and removal.
Tooling changes
-
Importing a Mojo module from Python no longer fails when the module lives in a read-only directory (for example, a Mojo extension installed into a read-only
site-packages). Previously the importer always tried to write its compiled artifacts to a__mojocache__directory next to the source, which raised anOSErroron a read-only file system. The importer now keeps that in-tree behavior when the source directory is writable, and otherwise redirects the cache to the Modular cache folder. That location honors the standard Modular configuration: thecache_dirkey inmodular.cfg, theMODULAR_CACHE_DIRandMODULAR_HOMEenvironment variables, and the XDG base directory specification. -
The
mojocompiler will now print the filename and line number in diagnostics that point to inaccessible source locations (e.g., from precompiled libraries) instead of a location at the top of the main file:# Before$> mojo example.mojo/path/to/example.mojo:33:16: error: invalid call to '__setitem__': violated constraintvec[base + i] = values[i].cast[dtype]()~~~^~~~~~~~~~/path/to/example.mojo:1:1: note: constraint declared here evaluated to False, expected 'mut'from std.algorithm.functional import elementwise^/path/to/example.mojo:1:1: note: function declared herefrom std.algorithm.functional import elementwise^# After$> mojo example.mojo/path/to/example.mojo:33:16: error: invalid call to '__setitem__': violated constraintvec[base + i] = values[i].cast[dtype]()~~~^~~~~~~~~~max/kernels/src/layout/layout_tensor.mojo:2092: note: constraint declared here evaluated to False, expected 'mut'max/kernels/src/layout/layout_tensor.mojo:2090: note: function declared here -
The
mojocompiler now provides more useful diagnostics in the case that source information is unavailable by synthesizing a declaration and pretty-printing it.For example, instead of the following, with no contextual information after the 'here':
/path/to/file.mojo:2092: note: function declared here:The user will now see:
/path/to/file.mojo:2092: note: function declared here:def __setitem__[*Tys: Indexer](self, *args: *Tys.values, *, val: SIMD[dtype, Self.element_size]) where mutThe coverage and quality of diagnostics in such cases will continue to improve in subsequent releases.
-
The
mojo packagecommand has renamed tomojo precompile. Similarly, the.mojopkgfile extension has been deprecated; favor the.mojocfile extension instead.# Beforemojo package my_package -o my_package.mojopkg# Aftermojo precompile my_package -o my_package.mojoc -
Added
mojo --print-cache-locationandmojo --clear-cachefor inspecting and clearing the on-disk Mojo compile cache (.mojo_cache). The resolved path honors the existing precedence (MODULAR_CACHE_DIR,MODULAR_HOME,MODULAR_DERIVED_PATH,XDG_CACHE_HOME, etc.).--clear-cacheprompts for confirmation by default; pass-f(or--force) to skip the prompt for scripting use.$ mojo --print-cache-location/home/you/.cache/modular/.mojo_cache$ mojo --clear-cacheThis will remove the Mojo compile cache at:/home/you/.cache/modular/.mojo_cacheProceed? [y/N] yRemoved /home/you/.cache/modular/.mojo_cache$ mojo --clear-cache -f # no prompt -
The Mojo compiler now reports call-related errors on the operand value that causes the failure, instead of on the call overall. This makes it easier to understand failures in calls with many arguments spread over multiple lines.
-
mojo formatnow accepts the bare move-capture form{name^}in closure capture lists. Previously only the equivalent{var name^}form round-tripped through the formatter.
GPU programming
-
Added
DeviceContextList[size]instd.gpu.host: a fixed-size,Copyable/ImplicitlyCopyable/Sizedcollection ofDeviceContextvalues. Multi-device custom-opexecutemethods now receive aDeviceContextList[N]— the graph compiler synthesizes one from the per-device contexts attached to the op via a variadic constructor. Kernels can index into it withdev_ctxs[i](runtime) ordev_ctxs.__getitem_param__[i]()(comptime), and iterate withlen(). This replaces the previousDeviceContextPtrListpattern.from gpu.host import DeviceContext, DeviceContextList@compiler.register("mo.distributed.allreduce.sum")struct DistributedAllReduceSum:@staticmethoddef execute[dtype: DType, rank: Int, target: StaticString, _trace_name: StaticString,](outputs: FusedOutputVariadicTensors[dtype=dtype, rank=rank, ...],inputs: InputVariadicTensors[dtype=dtype, rank=rank, ...],signal_buffers: MutableInputVariadicTensors[dtype=DType.uint8, rank=1, ...],dev_ctxs: DeviceContextList,) capturing raises:comptime num_devices = inputs.size# ... use dev_ctxs[i] per device ... -
DeviceContext.enqueue_function[func]andDeviceContext.compile_function[func]now accept a single kernel argument instead of requiring it to be passed twice. The previous two-argument formsenqueue_function[func, func]andcompile_function[func, func]are deprecated. The transitionalenqueue_function_experimentalandcompile_function_experimentalaliases are also deprecated; switch toenqueue_function/compile_function.# Beforectx.enqueue_function[my_kernel, my_kernel](grid_dim=1, block_dim=1)ctx.enqueue_function_experimental[my_kernel](grid_dim=1, block_dim=1)# Afterctx.enqueue_function[my_kernel](grid_dim=1, block_dim=1)
Removed
-
The
DeviceContextPtrandDeviceContextPtrListtypes have been removed fromstd.runtime.asyncrt. Custom-opexecutemethods now takeDeviceContextdirectly (orOptional[DeviceContext]where the context is genuinely optional), and multi-device ops takeDeviceContextList[N](see the new entry under Library changes). The helpersget_device_context()andget_optional_device_context()are no longer needed — pass theDeviceContextthrough directly. TheCpuDeviceContextruntime always supplies a real context for the CPU path, so the nullable wrapper is no longer required.# Beforefrom runtime.asyncrt import DeviceContextPtr, DeviceContextPtrList@compiler.register("my_op")struct MyOp:@staticmethoddef execute[target: StaticString](output: OutputTensor,input: InputTensor,ctx: DeviceContextPtr,) raises:var gpu_ctx = ctx.get_device_context()...# Afterfrom gpu.host import DeviceContext@compiler.register("my_op")struct MyOp:@staticmethoddef execute[target: StaticString](output: OutputTensor,input: InputTensor,ctx: DeviceContext,) raises:... -
The
use_blocking_implparameter has been removed fromelementwise(instd.algorithm.functional), and the analogoussingle_thread_blocking_overrideparameter has been removed from the reduction APIs (reduce,max,min,sum,product,meaninstd.algorithm.reduction). These operations now always dispatch work the same way, with a single worker used automatically when the problem size is small, so the blocking variants are no longer needed. -
The legacy
fnkeyword now produces an error instead of a warning. Please move todef. -
The previously-deprecated
constrained[cond, msg]()function has been removed. Usecomptime assert cond, msginstead. -
The previously-deprecated
Int-returning overload ofnormalize_indexhas been removed. Use theUInt-returning overload (or write the index arithmetic inline, e.g.x[len(x) - 1]). -
The previously-deprecated default
UnsafePointer()null constructor has been removed. To model a nullable pointer useOptional[UnsafePointer[...]]. For a non-null placeholder for delayed initialization, useUnsafePointer.unsafe_dangling(). -
The deprecated free-function reflection API in
std.reflectionhas been removed. Use the unifiedreflect[T]() -> Reflected[T]API instead.Migration table:
struct_field_count[T]()→reflect[T]().field_count()struct_field_names[T]()→reflect[T]().field_names()struct_field_types[T]()→reflect[T]().field_types()struct_field_index_by_name[T, name]()→reflect[T]().field_index[name]()struct_field_type_by_name[T, name]()→reflect[T]().field_type[name]()struct_field_ref[idx](s)→reflect[T]().field_ref[idx](s)is_struct_type[T]()→reflect[T]().is_struct()offset_of[T, name=...]()→reflect[T]().field_offset[name=...]()offset_of[T, index=...]()→reflect[T]().field_offset[index=...]()ReflectedType[T]→Reflected[T]
-
StringandStringSlicecan now be sliced by codepoints, e.g.String("🔄🔥🔄")[codepoint=1:2]returns"🔥". -
StringandStringSlicecan now be indexed by graphemes, e.g.String("👨🚀🧑🌾क्षि")[grapheme=1]returns"🧑🌾".
Fixed
-
Fixed a GPU reduction correctness bug that produced wrong results for a contiguous last-axis reduction (for example
meanover the last axis) once the number of rows reached256 * sm_count(37888 rows on a 148-SM GPU). An N-D reduction is normalized to a rank-3(outer, reduce, inner)shape, so a last-axis reduction has a trailinginner == 1dimension; the kernel launcher treated that as a non-contiguous reduction and, once the device was thread-saturated, dispatched a kernel whose cross-row SIMD packing is only valid when a real inner dimension supplies the adjacent rows. Contiguity is now derived from the layout (the reduce dimension is innermost whenever every dimension after it is unit-sized). -
Reduced the virtual address space reserved by every
mojoinvocation by ~1 GiB. The JIT memory mapper's reservation granularity was 1 GiB, so each fresh reservation was rounded up to that size and mmappedPROT_READ|PROT_WRITE, inflatingVmPeakand counting against LinuxRLIMIT_AS. This caused non-deterministic OOM crashes inlibKGENCompilerRTShared.sowhen twomojoprocesses ran concurrently on memory-constrained CI runners (e.g. GitHub Actions free-tier, 7 GiB). The granularity is now 64 MiB; large compiles still work because the mapper reserves additional slabs on demand. (Issue #6433) -
Attempting to import a source Mojo package from a broken symlink will no longer result in a compiler crash. (Issue #6424)
-
MODULAR_NVPTX_COMPILER_PATHis now part of mojo cache location so that when switching to a differentptxasCUBIN cache will not hit those were generated before the switch. (Issue #6540) -
Fixed the
mojocompiler incorrectly emitting AVX-512 instructions on hosts where the CPU model (e.g.znver4) advertises AVX-512 but the OS has not enabled it in XCR0 — for example, inside Docker containers on GitHub Actions. Host CPU features are now cross-checked against the runtime CPUID view, so features the kernel withholds no longer causeSIGILLat runtime. (Issue #6413)