Hi Ankur, Catalin, How about the following series based on a refactor of arm64's delay()? Does it address your earlier concerns? delay() already implements wfet() and falls back to wfe() w/ evstream or a cpu_relax loop. I refactored it to poll an address, and wrapped in a new platform-agnostic smp_vcond_load_relaxed() macro. More details in the following commit log. Regards, Haris Okanovic AWS Graviton Software