Version: 1.0.0b2

For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

copy_local_to_shared

def copy_local_to_shared[thread_layout: Layout[thread_layout.shape_types, thread_layout.stride_types], *, swizzle: Optional[Swizzle] = None, num_threads: Int = thread_layout.size(), thread_scope: ThreadScope = ThreadScope.BLOCK, element_size: Int](dst: TileTensor[address_space=AddressSpace.SHARED, linear_idx_type=dst.linear_idx_type, element_size=element_size], src: TileTensor[address_space=AddressSpace.LOCAL, linear_idx_type=src.linear_idx_type, element_size=element_size])

Synchronously copies a tile from registers (LOCAL) to SRAM (shared).

Delegates to LocalToSharedTileCopier. The AMD row_major prefetch pattern and fp32 -> half-precision downcast of the legacy free function are not supported here.

Parameters:

thread_layout (Layout[thread_layout.shape_types, thread_layout.stride_types]): Layout describing how threads are organized over the copy.
swizzle (Optional[Swizzle]): Optional swizzle applied to the shared-memory destination; the same swizzle must be used by any subsequent reader of the tile.
num_threads (Int): Total number of threads in the thread block. Threads beyond thread_layout.size() do not participate.
thread_scope (ThreadScope): Scope at which thread operations are performed.
element_size (Int): Number of scalar elements per logical element; inferred from the source and destination tiles.

Args:

dst (TileTensor[address_space=AddressSpace.SHARED, linear_idx_type=dst.linear_idx_type, element_size=element_size]): Destination tile in shared memory.
src (TileTensor[address_space=AddressSpace.LOCAL, linear_idx_type=src.linear_idx_type, element_size=element_size]): Source tile in local memory.