On Wed, Jan 29, 2025 at 7:56 PM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: > What architecture are you running? I don't get why the reads > are expensive while it's relaxed and there shouldn't even be > any contention. It doesn't even need to be atomics, we still > should be able to convert int back to plain ints. I measured on an AMD Epyc 9654P. As you see in my numbers, around 40% of the CPU time was wasted on spinlock contention. Dozens of io-wq threads are trampling on each other's feet all the time. I don't think this is about memory accesses being exceptionally expensive; it's just about wringing every cycle from the code section that's under the heavy-contention spinlock.