Mojo v1.0.0b2
Highlights
-
Collections no longer require
Copyableelements. The core collection types —List,Deque,LinkedList,InlineArray,Dict, andSet— now accept move-only elements, withMovable & ImplicitlyDestructibleas the new minimum bound instead ofCopyable. This removes a longstanding source of friction where storing a value in a collection forced it, and its contents, to become copyable. Copy-requiring methods stay gated onCopyable. See Collections and iterators. -
Trailing
whereclauses in more places. Trailingwhereclauses are now supported on struct declarations, oncomptimealias declarations, and to discharge constraints from constrained types used anywhere in a signature. A single trailingwherecan simultaneously constrain a declaration and satisfy the requirements of the types it uses, and the compiler suggests the missing clause when one is needed. See Language enhancements. -
enqueue_function()andcompile_function()take a single kernel argument.DeviceContext.enqueue_function[func]()andcompile_function[func]()now accept the kernel parameter once instead of requiring it twice, cleaning up every GPU function callsite. The old two-argument forms and the transitional*_experimentalaliases are deprecated. See Device context and execution. -
Unicode-aware string subscripting.
StringandStringSlicenow support keyword subscripts that index or slice by Unicode codepoint ([codepoint=...]) or by grapheme cluster ([grapheme=...]), soString("🔄🔥🔄")[codepoint=1:2]returns"🔥". This makes correct, encoding-aware text indexing straightforward without manual byte arithmetic. See String and text. -
Faster Python → Mojo interop. Calls into Mojo from Python now carry significantly less per-call overhead: non-kwargs callables use CPython's
METH_FASTCALLconvention,PythonObject.__del__()skips the GIL round-trip when the calling thread already holds the GIL, and integer conversions fast-path exact Pythonintvalues. No code changes are required to benefit. See Python interoperability. -
New and expanded documentation. This release adds a Closures page, new
TileTensorguides, expanded coverage of partially bound and unbound types andrebind(), and several new reference pages — built-in types, function overloads, closure declarations, CLI feature toggles, and docstrings — plus a downloadable Mojo basics cheat sheet. See Documentation. -
mojo packageis nowmojo precompile. The packaging command has been renamed and the.mojopkgextension deprecated in favor of.mojoc— affecting everyone who precompiles Mojo packages, notably custom-op authors. The new.mojocpackages are also significantly smaller, with faster compile and load times. See Tooling changes. -
Inspect and clear the Mojo compile cache. New
mojo --print-cache-locationandmojo --clear-cacheflags report and purge the on-disk compile cache (.mojo_cache), honoring the standard cache-path precedence.--clear-cacheprompts by default; pass-fto skip it for scripting. See Tooling changes. -
fnis now an error. Uses of the legacyfnkeyword now produce a compilation error rather than a warning, completing the def/fn unification:defis Mojo's single function-declaration keyword. Move any remainingfndeclarations todef. See Removed. -
Implicit
stdimports are now an error. Standard library imports must be fully qualified; the compiler no longer implicitly resolves barestdmodule names. Besides making imports explicit, this stops the compiler from squatting on names likealgorithmandmemory, freeing them for user modules. See Language changes. -
Reflection API restructuring.
reflect[T]is now acomptimealias for theReflected[T]handle type rather than a function, so call sites drop their parentheses (reflect[T].name()), and the deprecated free-function reflection API (get_type_name(), thestruct_field_*family, andReflectedType[T]) has been removed in favor of methods onreflect[T]. A newreflect_fn[func]alias adds parallel function-side reflection. See Reflection.
Documentation
-
Added a Closures page to the Mojo Manual documenting the unified capture-list syntax, with runnable examples for each capture convention.
-
Added new Mojo Manual documentation for
TileTensor, including an updated tileLayoutguide, aTileTensorguide with examples, and an introduction describing the relationship betweenTileTensorandLayoutTensor. -
Expanded the Mojo Manual coverage of partially bound and unbound types and automatic parameterization, with clearer and more discoverable guidance on
rebind(). -
Added several new Mojo reference pages: built-in (prelude) types available without imports, function overloads and the compiler's overload-resolution rules, closure declarations, CLI feature toggles, and Mojo docstrings. Also added a Mojo basics cheat sheet, a downloadable quick-reference card for Mojo's core syntax.
Language enhancements
-
Trailing
whereclauses are now supported in more declaration contexts:-
On struct declarations. Constraints are part of the type and checked at every binding site:
struct SIMD[dtype: DType, size: Int]where dtype != DType.invalidwhere size.is_power_of_two():... -
On
comptimealias declarations:comptime PositiveOnly[N: Int]: AnyType where N > 0 = ... -
To discharge constraints from constrained types appearing anywhere in a signature. A single trailing
wherewill simultaneously constrain the declaration and satisfy the requirements of types used within the same signature:struct Matrix[m: Int, n: Int] where m > 0 where n > 0: ...def solve_linear_system[n: Int, a: Matrix[n, n], b: Vector[n]]() -> Vector[n]where n > 0:...If no trailing
wheredischarges a constraint, the compiler reports an error and suggests the missing clause.
-
-
Added an
@unavailabledecorator that marks a function or method as intentionally unavailable. Unlike@deprecated(which emits a warning), referencing an@unavailabledeclaration is an error. Like@deprecated, it accepts either a reason message (positional or asreason=) or ause=symbolreplacement. Whenuse=symbolis given, the error includes a fix-it that renames the call tosymbol.struct Foo:@unavailable("message here...")def foo(self) -> Int:...@unavailable(use=new_api)def old_api():...def new_api():pass -
Types may now be conditionally "ImplicitlyDestructible" with a where clause:
@explicit_destroy("Message when implicitly destroyed")struct ConditionallyLinearType[T: AnyType](ImplicitlyDestructible where conforms_to(T, ImplicitlyDestructible)):var data: Self.T -
Mojo now supports building types that support implicit conversions for widening origins, allowing code like this to "just work" without rebind:
def origin_superset_conversion(a: String, b: String, c: Bool) -> Pointer[String, origin_of(a, b)]:if c: # These pointers implicitly convert.return Pointer(to=a)else:return Pointer(to=b) -
Types can parameterize the
outargument modifier when they want to be bindable to alternate address spaces, for example:struct MemType(Movable):# Can be constructed into any address space.def __init__[addr_space: AddressSpace](out[addr_space] self):...# Only constructable into GLOBAL address space.def __init__(arg: Int, out[AddressSpace.GLOBAL] self):... -
refparameters can now use generic address spaces, for exampleref[origin, _]. The generic address space is auto-parameterized onto the function signature.
Language changes
-
Support for "set-only" accessors has been removed. You need to define a
__getitem__or__getattr__to use a type that defines the corresponding setter. This eliminates a class of bugs determining the effective element type. -
The
register_passableeffect keyword has been removed. Register passability is now computed implicitly from a type's contents, so the explicit keyword is no longer needed and is no longer accepted. -
Implicit
stdimports are now an error, following a period of deprecation. Imports from the standard library must now be fully qualified. The compiler thus no longer squats on these module names, paving the way for user modules namedalgorithm,memory, etc. -
The handling of the
abieffect on@exportfunctions has tightened:-
Specifying
ABI="C"in an@exportdecorator is now deprecated;abi("C")should be used instead.@export("old", ABI="C")def old(): pass@export("new")def new() abi("C"): pass -
Functions marked
@exportmust now be given an explicitabieffect, rather than relying implicitly on the default (equivalent toabi("Mojo")). The compiler will produce a warning on missingabieffects, which will become an error in a future release.Note that the
mainfunction is excepted from this. It is always implicitly@exported, and in the case thatmainis explicitly@exported, it is implicitly given the correct ABI. However, if a user both explicitly@exportsmainand provides an incompatible ABI (for example,raisesandabi("C")), then an error is still emitted. -
Functions marked as
raisesmay no longer be given theabi("C")effect or be@exported as such using the deprecatedABI="C"option.
-
-
whereclauses in parameter lists (param-where) are now deprecated. Movewhereclauses from the parameter list to a trailingwhereon the declaration. Note thatstructandcomptimedeclarations temporarily losewheresupport until declaration-levelwherelands.
Library changes
Type system and traits
-
The
ImplicitlyCopyable,Intable,Equatable,Indexer, andWritertraits no longer inherit fromImplicitlyDestructible. Generic code that relied on receiving the destructor bound transitively through these traits (or throughComparable, which inherits fromEquatable) must now spell it out explicitly, for exampleT: ImplicitlyCopyable & ImplicitlyDestructibleorT: Indexer & ImplicitlyDestructible. In practice, most generic code should preferT: Copyableinstead, per the guidance inImplicitlyCopyable's docstring. -
The
__init__method required by theMovabletrait has had its named argument changed fromtaketomove. Explicitly calling a move initializer is nowSomeObject(move=)instead ofSomeObject(take=). -
Added
is_trivially_movable(),is_trivially_copyable(), andis_trivially_destructible()tostd.memory. These helper functions return whether a type's move constructor, copy constructor, or destructor is trivial (that is, a bit-copy or a no-op).
Reflection
-
reflect[T]is now acomptimealias for theReflected[T]handle type rather than a function returning a zero-sized handle instance. All methods onReflected[T]are@staticmethods, and the type is no longer constructible. Drop the parens at call sites:# Beforecomptime r = reflect[Point]()print(r.field_count())print(reflect[Point]().name())comptime y_handle = reflect[Point]().field_type["y"]()var v: y_handle.T = 3.14# Aftercomptime r = reflect[Point]print(r.field_count())print(reflect[Point].name())comptime y_handle = reflect[Point].field_type["y"]var v: y_handle.T = 3.14field_type[name]is now a parametriccomptimemember alias that yieldsReflected[FieldT]directly — no trailing(), and the result is fully composable (for examplereflect[T].field_type["x"].name()). The previously deprecated free functionsget_type_name(),get_base_type_name(), and thestruct_field_*family (along with theReflectedType[T]wrapper) have been removed; use the corresponding methods onreflect[T]:Removed Replacement get_type_name[T]()reflect[T].name()get_base_type_name[T]()reflect[T].base_name()is_struct_type[T]()reflect[T].is_struct()struct_field_count[T]()reflect[T].field_count()struct_field_names[T]()reflect[T].field_names()struct_field_types[T]()reflect[T].field_types()struct_field_index_by_name[T, name]()reflect[T].field_index[name]()struct_field_type_by_name[T, name]()reflect[T].field_type[name]struct_field_ref[idx, T](s)reflect[T].field_ref[idx](s)offset_of[T, name=name]()reflect[T].field_offset[name=name]()offset_of[T, index=index]()reflect[T].field_offset[index=index]()ReflectedType[T]Reflected[T] -
Added
ReflectedFn[func], a function-side reflection handle accessed via thereflect_fn[func]comptimealias. Exposes function introspection through static methods, paralleling the type-sideReflected[T]API:from std.reflection import reflect_fndef my_func(x: Int) -> Int:return x + 1def main():print(reflect_fn[my_func].display_name()) # "my_func"print(reflect_fn[my_func].linkage_name()) # mangled symbol name
Pointer and memory
-
Added the
UntrackedOriginandUnsafeAnyOriginorigin aliases (and theirMut/Immutvariants) as the new names forExternalOriginandAnyOrigin, respectively.UntrackedOriginis the empty origin: it aliases nothing, so the lifetime checker has nothing to track, and it remains a supported tool for interfacing with memory from outside the Mojo program.UnsafeAnyOriginis the universal origin: it might alias anything, defeating lifetime extension and exclusivity checking, so itsUnsafeprefix marks it as an escape hatch slated for deprecation and removal.The origin-discarding cast methods on
UnsafePointer,TileTensor, andLayoutTensorhave correspondingly been renamed fromas_any_origin()toas_unsafe_any_origin().The old
ExternalOrigin,ImmutExternalOrigin, andMutExternalOriginaliases are now deprecated and emit a deprecation warning when referenced; useUntrackedOrigin,ImmutUntrackedOrigin, andMutUntrackedOriginrespectively instead. The deprecated aliases still forward to the new names, so existing code keeps compiling until they are removed in a future release. -
Added the layout-aware
alloc()/dealloc()allocation API inmemory.alloc.alloc()returns anAllocation[T], an owning handle that bundles the allocated pointer with theLayoutit was allocated with, anddealloc()consumes that handle to release the storage. ALayout[T]bundles an element count and alignment into a single value, keeping size and alignment requirements explicit and co-located at every call site.Allocation, and its bare layout-less counterpartThinAllocation, are@explicit_destroytypes: the compiler forces every allocation to be released on all paths — by passing it todealloc(), or by taking ownership of the raw pointer withunsafe_leak()— guarding against silent leaks, double-frees, and use-after-free. These APIs are intended to eventually replace the raw-pointer allocation APIs to promote memory safety.from std.memory import alloc, dealloc, Layoutvar layout = Layout[Int32](count=4)var allocation = alloc(layout)var ptr = allocation.unsafe_ptr()for i in range(layout.count()):(ptr + i).init_pointee_move(i)dealloc(allocation^) -
Added a
WeakPointer[T]type tostd.memory.arc_pointer, providing weak references to anArcPointer[T]for building self-referential and cyclically referential data structures that can still be destroyed. -
UnsafePointer.unsafe_from_address()now has an overloaded constructor that takes anIntLiteraland emits a compile-time assertion if the address is invalid (0or negative). -
UnsafeUnionnow propagates the address space of its origin instead of defaulting to theGENERICaddress space, allowing it to be used with address-space-specific memory such as GPU shared memory.
Collections and iterators
-
The core collection types no longer require their element type to be
Copyable—Movable & ImplicitlyDestructibleis now the minimum bound. This applies toList[T],Deque[T],LinkedList[T],InlineArray[T, size], both type parameters ofDict[K, V, H](along withSwissTable/SwissTableEntry/OwnedKwargsDictand the loosenedKeyElementtrait), andSet[T]. Copy-requiring methods stay gated onCopyable.Counter[V]is unchanged.Dict.setdefault()andSet.add()now take their argument byvar T; for move-only types call them asd.setdefault(key^, default)orset.add(value^). -
List[T]now conditionally conforms toImplicitlyDestructible: aListis implicitly destructible only when its element typeTis. This lets aListhold elements that must be explicitly destroyed (dropping such aListwould otherwise leak them), at the cost of a stricter check in generic code.Generic code that takes a
Listby value with only aMovableelement bound now fails to compile for everyT. Previously the error was deferred and only fired whenTwas instantiated with a non-ImplicitlyDestructibletype. Add& ImplicitlyDestructibleto the element bound:# Now errors for every `T` (previously only when `T` lacked a destructor):def foo[T: Movable, //](var list: List[T]):pass# Fix: require the destructor bound explicitly.def foo[T: Movable & ImplicitlyDestructible, //](var list: List[T]):passStructs that store a
Listare affected the same way. Either constrain the element type, or better yet, propagate the conditional conformance so your type supports explicitly-destroyed elements too, forwarding cleanup throughdestroy_with():# Option 1: require the element type to be implicitly destructible.struct Foo[T: Movable & ImplicitlyDestructible]:var list: List[Self.T]# Option 2: conditionally conform, and forward explicit destruction.@explicit_destroy("...")struct Foo[T: Movable](ImplicitlyDestructible where conforms_to(T, ImplicitlyDestructible),):var list: List[Self.T]def destroy_with(deinit self, f: Some[def(var Self.T)]):self.list^.destroy_with(f) -
A new
BinaryHeapcollection has been added to thestd.collectionsmodule. This is a list-backed binary max-heap. -
Added
nth()as a default method on theIteratortrait. It advances the iterator bynelements (destroying them) and returns the next element, orNoneif the iterator runs out before reaching indexn.var l = [10, 20, 30, 40]print(iter(l).nth(0).value()) # 10print(iter(l).nth(3).value()) # 40var missing = iter(l).nth(10) # None (Optional) -
Added
take()anddrop()iterator adapters tostd.itertools.take(iter, n)yields the firstnelements, anddrop(iter, n)drops the firstnelements. They compose naturally to select sub-ranges of any iterable:from std.itertools import take, dropvar nums = [1, 2, 3, 4, 5]for x in take(drop(nums, 1), 3):print(x) # 2, 3, 4 -
Added an
index()method toLinkedListfor finding the first occurrence of a value. Unlike Python'slist.index(), it omits thestart/stopparameters. -
Dictnow defers its backing-buffer allocation until the first insertion. Default-constructed andcapacity=0dictionaries no longer perform any heap allocations.
String and text
-
String.as_bytes_mut()has been renamed toString.unsafe_as_bytes_mut(), to reflect that writing invalid UTF-8 to the resultingSpan[Byte]can lead to later issues like out of bounds access. -
Several
StringSliceconstructors are now deprecated.StringSlice(ptr=..., length=...)is deprecated; useStringSlice(unsafe_from_utf8=Span(...))instead.StringSlice(unsafe_from_utf8_ptr=...)(taking a raw nul-terminatedUnsafePointer[Byte]orUnsafePointer[c_char]) is deprecated; construct aCStringSlicefrom the pointer and use the newStringSlice(unsafe_from_utf8=CStringSlice(...))constructor instead.
-
StringandStringSlicenow expose abytes()method that returns a newBytesIter, an iterator over the raw UTF-8 bytes of the string. This complements the existingcodepoints()andgraphemes()iterators by operating at the byte level without interpreting multi-byte UTF-8 sequences.var s = StringSlice("é") # Encoded in UTF-8 as 0xC3 0xA9.for b in s.bytes():print(b) # 195, 169 -
StringandStringSlicenow support Unicode-aware subscripting:- A keyword-only
[codepoint=...]subscript indexes or slices by Unicode codepoint offset, for exampleString("🔄🔥🔄")[codepoint=1:2]returns"🔥". - A
[grapheme=...]subscript indexes by grapheme, for exampleString("👨🚀🧑🌾क्षि")[grapheme=1]returns"🧑🌾".
- A keyword-only
Python interoperability
-
The CPython FFI bindings now carry the
abi("C")effect. User-written Python extension callbacks passed todef_py_c_function(),def_py_c_method(), orPyCapsule_New()must addabi("C")to their signatures, for exampledef my_func(self: PyObjectPtr, args: PyObjectPtr) abi("C") -> PyObjectPtr:. Functions registered through the higher-leveldef_function(),def_method(), anddef_staticmethod()paths are unaffected. -
PythonObject convertibility got simplified and cleaned up. When working with types that required custom conversions to
PythonObject, we used to write code like this:struct MyCustomType(ConvertibleToPython, ImplicitlyCopyable):def to_python_object(var self) raises -> PythonObject:return PythonObject( ... custom logic ...)def hi_python(a: Some[ImplicitlyCopyable & ConvertibleToPython]) raises:print(t"Hi, {a.to_python_object()}!")def example():hi_python(MyCustomType())This approach allows custom types to implement
ConvertibleToPythonto get a domain specific encoding as a Python object. Mojo has simplified this by making allConvertibleToPythontypes implicitly convert toPythonObject, so this can/should be simplified to:def hi_python(a: PythonObject) raises:print(t"Hi, {a}!") -
Python -> Mojo FFI calls registered through
PythonModuleBuilderandPythonTypeBuilderhave significantly reduced per-call overhead:-
Non-kwargs callables registered with
def_function()/def_method()/def_staticmethod()now use CPython'sMETH_FASTCALLcalling convention rather thanMETH_VARARGS. Kwargs-accepting functions still useMETH_VARARGS | METH_KEYWORDS. -
PythonObject.__del__()skips thePyGILState_Ensure/PyGILState_Releaseround-trip when the current thread already holds the GIL (checked viaPyGILState_Check). On the common Python -> Mojo FFI path (where CPython hands the callee an already-held GIL) the destructor pays just the check and a directPy_DecRef. The public contract is unchanged—dropping aPythonObjectfrom a thread that doesn't hold the GIL remains safe. -
Int(py=obj)andScalar[IntDType](py=obj)fast-path exact PythonintviaPyLong_AsSsize_t.
-
-
Extended
PyObjectFunctionto support 7- and 8-argument signatures, adding the corresponding type aliases and@implicitconstructor overloads.
Other library changes
-
The reduction axis of the
std.algorithmreductions (sum,product,mean,max,min, and the underlying_reduce_generatorplus the CPU and GPU backends) is now a keyword-only compile-time parameter namedreduce_diminstead of a runtime argument. Pass it in the parameter list, for examplesum[..., reduce_dim=axis](shape, ctx). -
Added
dual_elementwise()tostd.algorithm.functional, which executes two elementwise functions over their respective shapes in a single GPU kernel launch, fusing two independent elementwise passes into one. -
The default
seedforrandom.Random,random.NormalRandom, and the internal_PhiloxWrapperhas changed from0to0x3D30F19CD101(67280421310721) to match PyTorch'sat::Philox4_32_10default. Calls that omitted theseedargument will now produce a different output stream; passseed=0explicitly to keep the previous behavior. -
Added
Random.step_uniform_unbiased()andNormalRandom.step_normal_4()primitives to the Philox RNG.step_uniform_unbiased()returns fourFloat32values in (0, 1) using all 32 raw bits;step_normal_4()returns four normals from a single Philox step via same-step Box-Muller pairing.
GPU programming
Device context and execution
-
DeviceContext.enqueue_function[func]()andDeviceContext.compile_function[func]()now accept a single kernel argument instead of requiring it to be passed twice. The previous two-argument formsenqueue_function[func, func]()andcompile_function[func, func]()are deprecated. The transitionalenqueue_function_experimental()andcompile_function_experimental()aliases are also deprecated; switch toenqueue_function()/compile_function().# Beforectx.enqueue_function[my_kernel, my_kernel](grid_dim=1, block_dim=1)ctx.enqueue_function_experimental[my_kernel](grid_dim=1, block_dim=1)# Afterctx.enqueue_function[my_kernel](grid_dim=1, block_dim=1) -
Added
DeviceContextList[size]instd.gpu.host: a fixed-size,Copyable/ImplicitlyCopyable/Sizedcollection ofDeviceContextvalues. Multi-device custom-opexecutemethods now receive aDeviceContextList[N]— the graph compiler synthesizes one from the per-device contexts attached to the op via a variadic constructor. Kernels can index into it withdev_ctxs[i](runtime) ordev_ctxs.__getitem_param__[i]()(comptime), and iterate withlen(). This replaces the previousDeviceContextPtrListpattern.from gpu.host import DeviceContext, DeviceContextList@compiler.register("mo.distributed.allreduce.sum")struct DistributedAllReduceSum:@staticmethoddef execute[dtype: DType, rank: Int, target: StaticString, _trace_name: StaticString,](outputs: FusedOutputVariadicTensors[dtype=dtype, rank=rank, ...],inputs: InputVariadicTensors[dtype=dtype, rank=rank, ...],signal_buffers: MutableInputVariadicTensors[dtype=DType.uint8, rank=1, ...],dev_ctxs: DeviceContextList,) capturing raises:comptime num_devices = inputs.size# ... use dev_ctxs[i] per device ... -
Added
std.gpu.host.CompletionFlag, a non-owning handle to an MLRTM::Driver::CompletionFlag(an 8-byte slot in pinned host memory mapped into a device's address space). Pairs with the newDeviceStream.wait_for_host_value(flag, value)method, which stalls the stream until the flag's 64-bit slot equals the given value. Corresponds to CUDA'scuStreamWaitValue64and captures cleanly into a CUDA graph as a wait-value node, letting a CPU thread (or an AsyncRT worker dispatched byenqueue_host_func()) gate a GPU stream on host-produced data without a second stream or a blocking host-function callback. Currently CUDA-only; other backends raise. -
Added a
DevicePointerstruct, a host-side representation of a pointer to device memory that holds a reference to the owningDeviceBufferand performs bounds checking. -
Added a
max_single_allocation_sizequery toDeviceContextthat reports the largest single allocation the driver will currently service; on Metal it reflects the live Metal framework limit, while CUDA/HIP report available memory. -
Added a
PDLLevel.ONnamed constant as an alias forPDLLevel(1), for use in place of the numericPDLLevel(0)/PDLLevel(1)forms.
Layout and coordinates
-
Changed
Idxto acomptimealias forComptimeInt. UseIdx[value]instead ofIdx[value]()for compile-time coordinates. -
Coord,coord(),Idx,ComptimeInt, and related coordinate helpers now live in the standard library modulestd.utils.coord. Thelayout.coordmodule re-exports the same symbols for layout and kernel code;layoutalso hoists the common names at package scope for convenience. -
Kernel coordinate APIs now use
Coordto preserve compile-time static shape information:-
elementwise()now passes aCoordto its callback instead of anIndexList[rank], and accepts aCoordshape argument, letting you pass static dimensions. Rewrite callbacks fromdef func[width, rank, align](idx: IndexList[rank])todef func[width, align](coord: Coord), and calls fromelementwise(func, IndexList[2](...), ctx)toelementwise(func, Coord(...), ctx). -
Intnow conforms toCoordLike, soIntvalues can be passed directly toCoordconstructors without wrapping them inIdx(...).
-
-
Added nested-layout support (CuTe layout algebra) to
LayoutandTileTensor. A single.tile[]API now handles both flat and nested parent layouts, and newrow_major_nested()/col_major_nested()constructors (plusRowMajorNestedLayout/ColMajorNestedLayoutaliases) build re-nested layouts for MFMA register tiles andblocked_productoutputs. -
Added
TileTensor.copy_from()andTileTensor.split()for copying between compatible tile views and splitting tiles into static or runtime-sized partitions.
Device targeting and hardware support
-
Added
has_nvidia_gpu_accelerator[subarch]andhas_nvidia_gpu_accelerator[subarchs]overloads instd.sys.infothat combine compile-time and runtime checks for whether the host has an NVIDIA GPU of a given subarchitecture or newer. -
Added a new
std.sys.machinemodule providingMachineDefinition, along with expandedDeviceSpecandDeviceReftypes, for aggregating accelerators and supplying richer static device information during compilation. -
Added support for the
fp8e4m3,fp8e4m3fn, andfp8e5m2floating-point types on the Metal (Apple GPU) backend, and enabled nativeInt<->bfloat16conversion on Apple M2 (Apple8) Metal GPUs. -
math.log()forFloat32on NVIDIA GPUs now uses thelg2.approx.ftz.f32PTX intrinsic, which flushes subnormal inputs and outputs to zero (matching CUDA's__logf) and avoids the slower denormal-handling path.
Tooling changes
-
The
mojo packagecommand has renamed tomojo precompile. Similarly, the.mojopkgfile extension has been deprecated; favor the.mojocfile extension instead.# Beforemojo package my_package -o my_package.mojopkg# Aftermojo precompile my_package -o my_package.mojoc -
mojo precompilenow produces significantly smaller.mojocpackages by dropping a redundant serialized copy of each module's parser output, which also reduces package compile and load time. -
Added
mojo --print-cache-locationandmojo --clear-cachefor inspecting and clearing the on-disk Mojo compile cache (.mojo_cache). The resolved path honors the existing precedence (MODULAR_CACHE_DIR,MODULAR_HOME,MODULAR_DERIVED_PATH,XDG_CACHE_HOME, etc.).--clear-cacheprompts for confirmation by default; pass-f(or--force) to skip the prompt for scripting use.$ mojo --print-cache-location/home/you/.cache/modular/.mojo_cache$ mojo --clear-cacheThis will remove the Mojo compile cache at:/home/you/.cache/modular/.mojo_cacheProceed? [y/N] yRemoved /home/you/.cache/modular/.mojo_cache$ mojo --clear-cache -f # no prompt -
Importing a Mojo module from Python no longer fails when the module lives in a read-only directory (for example, a Mojo extension installed into a read-only
site-packages). Previously the importer always tried to write its compiled artifacts to a__mojocache__directory next to the source, which raised anOSErroron a read-only file system. The importer now keeps that in-tree behavior when the source directory is writable, and otherwise redirects the cache to the Modular cache folder. That location honors the standard Modular configuration: thecache_dirkey inmodular.cfg, theMODULAR_CACHE_DIRandMODULAR_HOMEenvironment variables, and the XDG base directory specification. -
The
mojocompiler will now print the filename and line number in diagnostics that point to inaccessible source locations (for example, from precompiled libraries) instead of a location at the top of the main file:# Before$> mojo example.mojo/path/to/example.mojo:33:16: error: invalid call to '__setitem__': violated constraintvec[base + i] = values[i].cast[dtype]()~~~^~~~~~~~~~/path/to/example.mojo:1:1: note: constraint declared here evaluated to False, expected 'mut'from std.algorithm.functional import elementwise^/path/to/example.mojo:1:1: note: function declared herefrom std.algorithm.functional import elementwise^# After$> mojo example.mojo/path/to/example.mojo:33:16: error: invalid call to '__setitem__': violated constraintvec[base + i] = values[i].cast[dtype]()~~~^~~~~~~~~~max/kernels/src/layout/layout_tensor.mojo:2092: note: constraint declared here evaluated to False, expected 'mut'max/kernels/src/layout/layout_tensor.mojo:2090: note: function declared here -
The
mojocompiler now provides more useful diagnostics in the case that source information is unavailable by synthesizing a declaration and pretty-printing it.For example, instead of the following, with no contextual information after the 'here':
/path/to/file.mojo:2092: note: function declared here:The user will now see:
/path/to/file.mojo:2092: note: function declared here:def __setitem__[*Tys: Indexer](self, *args: *Tys.values, *, val: SIMD[dtype, Self.element_size]) where mutThe coverage and quality of diagnostics in such cases will continue to improve in subsequent releases.
-
The Mojo compiler now reports call-related errors on the operand value that causes the failure, instead of on the call overall. This makes it easier to understand failures in calls with many arguments spread over multiple lines.
-
Improved the clarity and actionability of a wide range of compiler diagnostics—declaration resolution,
main(), parser, lexer, signature, and call-emission errors—explaining what is wrong and how to fix it. -
Improved diagnostics for splatting a
VariadicPackinto a fixed-arity callee: the compiler now attaches a hint pointing to the supported dispatcher pattern, and acallee(*pack)pack-unpack mismatch now reports both the actual and expected element types instead of only the pack trait. -
MODULAR_DEBUG=uninitialized-read-checkfailures now print the kernel source location, dtype, lane index, observed bit pattern, and block/thread indices of each trapping lane, instead of being silenced by the thread-(0,0,0) print gate. -
mojo formatnow accepts the bare move-capture form{name^}in closure capture lists. Previously only the equivalent{var name^}form round-tripped through the formatter. -
The Mojo language server now returns
ContentModifiedinstead ofInvalidRequestfor completion requests that arrive during a reparse, fixing missing completions in clients such as Neovim's built-in LSP client. -
The LLDB debugger now provides type summaries for Mojo's
PythonObject(showing the underlying Python type name and decoding common built-ins such asNone,bool,int,float, and item counts forlist/tuple/dict) and forDict[K, V](showing(size N)and exposing live entries in insertion order with their keys and values).
Removed
-
The legacy
fnkeyword now produces an error instead of a warning. Please move todef. -
The
DeviceContextPtrandDeviceContextPtrListtypes have been removed fromstd.runtime.asyncrt. Custom-opexecutemethods now takeDeviceContextdirectly (orOptional[DeviceContext]where the context is genuinely optional), and multi-device ops takeDeviceContextList[N](see the new entry under Library changes). The helpersget_device_context()andget_optional_device_context()are no longer needed — pass theDeviceContextthrough directly. TheCpuDeviceContextruntime always supplies a real context for the CPU path, so the nullable wrapper is no longer required.# Beforefrom runtime.asyncrt import DeviceContextPtr, DeviceContextPtrList@compiler.register("my_op")struct MyOp:@staticmethoddef execute[target: StaticString](output: OutputTensor,input: InputTensor,ctx: DeviceContextPtr,) raises:var gpu_ctx = ctx.get_device_context()...# Afterfrom gpu.host import DeviceContext@compiler.register("my_op")struct MyOp:@staticmethoddef execute[target: StaticString](output: OutputTensor,input: InputTensor,ctx: DeviceContext,) raises:... -
Removed
DeviceContext.compile_function_unchecked()andDeviceContext.enqueue_function_unchecked(). Use the checkedcompile_function()andenqueue_function()instead. -
Several parameters and overloads were removed from
elementwiseand the reduction APIs:-
The
use_blocking_implparameter has been removed fromelementwise(instd.algorithm.functional), and the analogoussingle_thread_blocking_overrideparameter from the reduction APIs (reduce,max,min,sum,product,meaninstd.algorithm.reduction). These operations now always dispatch work the same way, with a single worker used automatically when the problem size is small, so the blocking variants are no longer needed. -
The
pdl_levelparameter has been removed fromelementwise,dual_elementwise, and the GPU elementwise implementations. PDL level 1 is now always used. -
The two
Optional[DeviceContext]overloads ofelementwise(instd.algorithm.functional) have been removed; callers now thread a non-optionalDeviceContextthrough directly. TheCpuDeviceContextruntime always supplies a real context for the CPU path, so the nullable wrapper is no longer needed.
-
-
The deprecated free-function reflection API in
std.reflectionhas been removed. Use the unifiedreflect[T]API instead; see Reflection under Library changes for the full migration table. -
Several previously-deprecated APIs have been removed:
-
The
constrained[cond, msg]()function. Usecomptime assert cond, msginstead. -
The
Int-returning overload ofnormalize_index(). Use theUInt-returning overload (or write the index arithmetic inline, for examplex[len(x) - 1]). -
The default
UnsafePointer()null constructor. To model a nullable pointer useOptional[UnsafePointer[...]]. For a non-null placeholder for delayed initialization, useUnsafePointer.unsafe_dangling().
-
-
The
-kgenModuleflag has been removed frommojo precompile. It emitted a serialized KGEN module (.mlirbc) instead of a.mojocpackage and was only used internally.
Fixed
-
Fixed a GPU reduction correctness bug that produced wrong results for a contiguous last-axis reduction (for example
meanover the last axis) once the number of rows reached256 * sm_count(37888 rows on a 148-SM GPU). An N-D reduction is normalized to a rank-3(outer, reduce, inner)shape, so a last-axis reduction has a trailinginner == 1dimension; the kernel launcher treated that as a non-contiguous reduction and, once the device was thread-saturated, dispatched a kernel whose cross-row SIMD packing is only valid when a real inner dimension supplies the adjacent rows. Contiguity is now derived from the layout (the reduce dimension is innermost whenever every dimension after it is unit-sized). -
Reduced the virtual address space reserved by every
mojoinvocation by ~1 GiB. The JIT memory mapper's reservation granularity was 1 GiB, so each fresh reservation was rounded up to that size and mmappedPROT_READ|PROT_WRITE, inflatingVmPeakand counting against LinuxRLIMIT_AS. This caused non-deterministic OOM crashes inlibKGENCompilerRTShared.sowhen twomojoprocesses ran concurrently on memory-constrained CI runners (for example GitHub Actions free-tier, 7 GiB). The granularity is now 64 MiB; large compiles still work because the mapper reserves additional slabs on demand. (Issue #6433) -
Attempting to import a source Mojo package from a broken symlink will no longer result in a compiler crash. (Issue #6424)
-
A bug preventing
from . import modulewith a spurious recursive-reference error has been fixed. -
MODULAR_NVPTX_COMPILER_PATHis now part of the Mojo cache location, so that switching to a differentptxasno longer reuses CUBIN cache entries that were generated before the switch. (Issue #6549) -
Fixed a lifetime-checker bug where destroying a type that captures origins (its destructor can access those origins) failed to extend the referenced value's lifetime beyond the
__del__call. -
Fixed linear/no-return function interaction so that
readarguments and other values are no longer required to be live after a no-return call (for exampleabort()), reducing code size and eliminating spurious linear-type errors. -
Fixed several compiler crashes and miscompiles in parameter inference and trait casting, including type-value convertibility incorrectly rejected across an upcast and a
Downcastbetween traits dropping the original type's trait conformance. -
Fixed several trait-inheritance and conformance bugs: refining traits that inherited a defaulted associated alias (whose default referenced
Self, or that was declared abstract by the refined trait) were rejected or crashed; awhere conforms_to(T, Trait)clause did not propagate to later parameter matching; loading trait functions from bytecode could crash; a precompiled- package stub closure trait cached before its full definition produced a "closure trait missing call" crash; and passing a struct with a compatible__call__method to a function-trait parameter now auto-conforms or produces a proper error instead of crashing. (Issue #6354) -
Fixed a bug where passing a function literal to a parameter typed as a sugared closure trait (for example
comptime CallbackType = def(Int) -> Int) failed to inflate the literal to the requisite trait conformer. -
Fixed a bug where passing a struct larger than 16 bytes to a Mojo callback decorated with
abi("C")failed on targets like x86-64 because the callback was missing the requiredbyvalflags. (Issue #6511) -
Fixed a code-generation bug where an indirect tail call to an
abi("C")function returning a struct viasretcould read uninitialized memory; thetailattribute is now correctly removed for such indirect calls. -
The compiler now prefers a
.mojocmodule over a stale.mojopkgof the same name when both live side by side in a directory, avoiding errors from picking up an older build. -
Fixed misdiagnosis when the compiler failed to synthesize an implicit copy constructor because a field is not
ImplicitlyCopyable, and corrected the conditional-conformance backup path to checkImplicitlyCopyablerather thanCopyable. -
Re-enabled error reporting in the elaborator that had previously been disabled.
-
Splatting a non-
VariadicPackvalue (for exampleprint(*l)wherelis aList[Int]) now emits a clear error instead of crashing the parser. (Issue #6350) -
Control-flow statements (
if,for,while,try,with, and theircomptimeforms) placed directly in a struct, trait, or extension body, or at module scope, now emit a parse error instead of crashing the compiler. -
Fixed a crash in
mojo docwhen emitting diagnostics for declarations without valid source locations (for example, from bytecode packages). -
Fixed
TileTensor.raw_store()not forwarding itswidthparameter to the underlyingUnsafePointer.store()call. -
Fixed a POP-to-LLVM lowering failure ("existing function with conflicting signature") that occurred when a graph composed both an external cubin/PTX launch via
enqueue_function()and a matmul; the launch path now casts grid/block dimensions, shared-memory size, and attribute count toUInt32to match the C ABI. -
MODULAR_DEBUG=uninitialized-read-checkno longer produces false positives for fp8 dtypes, whose legitimate saturate-to-max values were bit-identical to the poison sentinel; the fill and load-site check are now skipped for all fp8 formats (fp16, bf16, fp32, and fp64 are unaffected). -
Fixed
isinf()for finite-only float8 dtypes (float8_e4m3fn,float8_e8m0fnu), which previously fell through to anllvm.is.fpclassintrinsic with noi8overload and failed during LLVM lowering;isinf()now correctly returnsFalsefor these formats at compile time. -
Fixed several Apple GPU (Metal) backend code-generation bugs: illegal generic-address-space accesses when unpacking an
OptionalReg[T]containing pointer fields,bfloat16arithmetic on M1 (Apple7) and M2 (Apple8) GPUs that lack native support, and an unlowerablemax()intrinsic onSIMDfloat vectors. -
Fixed AMD RDNA GPU architecture detection at compile time by removing the
amdgpu:prefix from the AMD RDNA GPU architecture patterns, which had caused compilation to fail on AMD RDNA GPUs. -
UnsafePointerimplicit constructor has been fixed. When a function took anUnsafePointer[mut=False, ...], and was passed a mutable pointer, the incorrect constructor was chosen from overload resolution resulting in the new origin beingImmutableAnyOrigin. This is an issue as it occasionally hid mutability aliasing between pointers and hid some unused variables. The constructor now correctly casts toImmutOrigin(Self.origin).
Special thanks
Special thanks to our community contributors:
Adam Kruger (@lightofbaldr), Deftera (@Deftera186), Dylan Stark (@dylan-stark), Gabriel de Marmiesse (@gabrieldemarmiesse), Giorgos Smyridis (@gsmyridis), Jongmin Park (@GzuPark), Keven Villeneuve (@kevenv), Mahendra Singh Rathore (@mahendrarathore1742), Manuel Saelices (@msaelices), martinvuyk (@martinvuyk), Mose Schmiedel (@moseschmiedel), Olcmyk (@Olcmyk), Piper (@piperchester)