NamedBarrierSemaphore
struct NamedBarrierSemaphore[thread_count: Int32, id_offset: Int32, max_num_barriers: Int32]
A device-wide semaphore implementation for NVIDIA GPUs with named barriers.
It's using an acquire-release logic instead of atomic instructions for inter-CTA synchronization with a shared lock variable. Please note that the memory barrier is for syncing warp groups within in a CTA. Cutlass reference implementation: https://github.com/NVIDIA/cutlass/blob/a1aaf2300a8fc3a8106a05436e1a2abad0930443/include/cutlass/arch/barrier.h.
Parameters
- thread_count (
Int32): Number of threads participating in the barrier. - id_offset (
Int32): Offset for the barrier ID. - max_num_barriers (
Int32): Maximum number of named barriers to use.
Implemented traits
AnyType,
Copyable,
ImplicitlyCopyable,
ImplicitlyDestructible,
Movable,
RegisterPassable,
TrivialRegisterPassable