For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).
copy_dram_to_sram_async
def copy_dram_to_sram_async[thread_layout: Layout[thread_layout.shape_types, thread_layout.stride_types], *, swizzle: Optional[Swizzle] = None, masked: Bool = False, eviction_policy: CacheEviction = CacheEviction.EVICT_NORMAL, num_threads: Int = thread_layout.size(), thread_scope: ThreadScope = ThreadScope.BLOCK, element_size: Int](dst: TileTensor[Storage=dst.Storage, address_space=AddressSpace.SHARED, linear_idx_type=dst.linear_idx_type, element_size=element_size], src: TileTensor[Storage=src.Storage, linear_idx_type=src.linear_idx_type, element_size=element_size], src_num_valid_rows: OptionalReg[Int] = None)
Asynchronously copies a tile from DRAM (generic memory) to SRAM (shared).
Delegates to GenericToSharedAsyncTileCopier, which issues NVIDIA
cp.async instructions (falling back to synchronous loads/stores on AMD
and Apple GPUs). The copy is asynchronous: callers must commit it via
async_copy_commit_group() and synchronize via async_copy_wait_all()
or async_copy_wait_group() before reading the destination tile.
Unlike the legacy free function, whose swizzle: Bool auto-derived an
ldmatrix swizzle, this wrapper takes an Optional[Swizzle] directly and
does not replicate that auto-derivation; pass an explicit swizzle (for
example from make_swizzle[..., access_size=element_size]) when one is
required.
Parameters:
- thread_layout (
Layout[thread_layout.shape_types, thread_layout.stride_types]): Layout describing how threads are organized over the copy. - swizzle (
Optional[Swizzle]): Optional swizzle applied to the shared-memory destination for bank-conflict mitigation.Noneproduces a straight copy. Subsequent readers of the tile must use the same swizzle. - masked (
Bool): WhenTrue, performs per-vector bounds-checking; vectors past the bound issue zero-fillingcp.asyncoperations. - eviction_policy (
CacheEviction): Cache eviction policy for the source data. - num_threads (
Int): Total number of threads in the thread block. Threads beyondthread_layout.size()do not participate. - thread_scope (
ThreadScope): Scope at which thread operations are performed. - element_size (
Int): Number of scalar elements per logical element; inferred from the source and destination tiles.
Args:
- dst (
TileTensor[Storage=dst.Storage, address_space=AddressSpace.SHARED, linear_idx_type=dst.linear_idx_type, element_size=element_size]): Destination tile in shared memory. - src (
TileTensor[Storage=src.Storage, linear_idx_type=src.linear_idx_type, element_size=element_size]): Source tile in generic memory. - src_num_valid_rows (
OptionalReg[Int]): Explicit valid-row count for the masked bound. WhenNone(default) the bound is derived fromsrc.dim[0]()(byte-identical to legacy). When provided it overridessrc.dim[0]()so asrcwith a static row dim (e.g. aTileTensor.tilesub-view) can still drive a correct partial-tile zero-fill. Only consulted whenmaskedisTrue.