cluster_arrive
cluster_arrive()
Signals arrival at a cluster synchronization point with memory ordering guarantees.
This function ensures all prior memory operations from this thread block are visible to other thread blocks in the cluster before proceeding. Only supported on NVIDIA SM90+ GPUs.