IMPORTANT: To view this page as Markdown, append `.md` to the URL (e.g. /docs/manual/basics.md). For the complete Mojo documentation index, see llms.txt.
Skip to main content
Version: 1.0.0b1
For the complete Mojo documentation index, see llms.txt. Markdown versions of all pages are available by appending .md to any URL (e.g. /docs/manual/basics.md).

async_copy

async_copy[dtype: DType, //, size: Int, *, fill: Optional[Scalar[dtype]] = None, bypass_L1_16B: Bool = True, l2_prefetch: Optional[Int] = None, eviction_policy: CacheEviction = CacheEviction.EVICT_NORMAL](src: UnsafePointer[Scalar[dtype], address_space=AddressSpace.GLOBAL], dst: UnsafePointer[Scalar[dtype], address_space=AddressSpace.SHARED], src_size: Int32 = size, predicate: Bool = False)

Asynchronously copies data from global memory to shared memory.

This function provides a high-performance asynchronous memory copy operation with configurable caching behavior, prefetching, and fill values. It maps directly to the PTX cp.async instruction on NVIDIA GPUs.

Constraints:

  • Fill value only supported for dtypes <= 32 bits.
  • Size must be 4, 8, or 16 bytes.
  • Cannot enable both L2 prefetch and L1 bypass.
  • L2 prefetch size must be 64, 128, or 256 bytes.

Parameters:

  • dtype (DType): The data type to copy (e.g. float32, int32).
  • size (Int): Number of bytes to copy (must be 4, 8, or 16).
  • fill (Optional[Scalar[dtype]]): Optional fill value for uncopied bytes when src_size < size.
  • bypass_L1_16B (Bool): If True, bypasses L1 cache for 16-byte copies.
  • l2_prefetch (Optional[Int]): Optional L2 prefetch size (64, 128, or 256 bytes).
  • eviction_policy (CacheEviction): Cache eviction policy for the copy operation.

Args: