cluster_sync
cluster_sync()
Performs a full cluster synchronization with memory ordering guarantees.
This is a convenience function that combines cluster_arrive() and cluster_wait() to provide a full barrier synchronization across all thread blocks in the cluster. Ensures memory ordering between thread blocks. Only supported on NVIDIA SM90+ GPUs.