Re: [PATCH 4/8] io_uring/io-wq: cache work->flags in variable

Max Kellermann <max.kellermann@xxxxxxxxx> · Wed, 29 Jan 2025 20:11:17 +0100

On Wed, Jan 29, 2025 at 7:56 PM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
> What architecture are you running? I don't get why the reads
> are expensive while it's relaxed and there shouldn't even be
> any contention. It doesn't even need to be atomics, we still
> should be able to convert int back to plain ints.

I measured on an AMD Epyc 9654P.
As you see in my numbers, around 40% of the CPU time was wasted on
spinlock contention. Dozens of io-wq threads are trampling on each
other's feet all the time.
I don't think this is about memory accesses being exceptionally
expensive; it's just about wringing every cycle from the code section
that's under the heavy-contention spinlock.