On 1/29/25 4:41 PM, Pavel Begunkov wrote: > On 1/29/25 19:11, Max Kellermann wrote: >> On Wed, Jan 29, 2025 at 7:56?PM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: >>> What architecture are you running? I don't get why the reads >>> are expensive while it's relaxed and there shouldn't even be >>> any contention. It doesn't even need to be atomics, we still >>> should be able to convert int back to plain ints. >> >> I measured on an AMD Epyc 9654P. >> As you see in my numbers, around 40% of the CPU time was wasted on >> spinlock contention. Dozens of io-wq threads are trampling on each >> other's feet all the time. >> I don't think this is about memory accesses being exceptionally >> expensive; it's just about wringing every cycle from the code section >> that's under the heavy-contention spinlock. > > Ok, then it's an architectural problem and needs more serious > reengineering, e.g. of how work items are stored and grabbed, and it > might even get some more use cases for io_uring. FWIW, I'm not saying > smaller optimisations shouldn't have place especially when they're > clean. Totally agree - io-wq would need some improvements on the where to queue and pull work to make it scale better, which may indeed be a good idea to do and would open it up to more use cases that currently don't make much sense. That said, also agree that the minor optimizations still have a place, it's not like they will stand in the way of general improvements as well. -- Jens Axboe