Re: [PATCH 0/8] Various io_uring micro-optimizations (reducing lock contention)

Max Kellermann <max.kellermann@xxxxxxxxx> · Wed, 29 Jan 2025 20:43:44 +0100

On Wed, Jan 29, 2025 at 8:30 PM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
> It's great to see iowq getting some optimisations, but note that
> it wouldn't be fair comparing it to single threaded peers when
> you have a lot of iowq activity as it might be occupying multiple
> CPUs.

True. Fully loaded with the benchmark, I see 400%-600% CPU usage on my
process (30-40% of that being spinlock contention).
I wanted to explore how far I can get with a single (userspace)
thread, and leave the dirty thread-sync work to the kernel.

> It's wasteful unless you saturate it close to 100%, and then you
> usually have SQPOLL on a separate CPU than the user task submitting
> requests, and so it'd take some cache bouncing. It's not a silver
> bullet.

Of course, memory latency always bites us in the end. But this isn't
the endgame just yet, we still have a lot of potential for
optimizations.