On Wed, Jan 29, 2025 at 8:30 PM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: > It's great to see iowq getting some optimisations, but note that > it wouldn't be fair comparing it to single threaded peers when > you have a lot of iowq activity as it might be occupying multiple > CPUs. True. Fully loaded with the benchmark, I see 400%-600% CPU usage on my process (30-40% of that being spinlock contention). I wanted to explore how far I can get with a single (userspace) thread, and leave the dirty thread-sync work to the kernel. > It's wasteful unless you saturate it close to 100%, and then you > usually have SQPOLL on a separate CPU than the user task submitting > requests, and so it'd take some cache bouncing. It's not a silver > bullet. Of course, memory latency always bites us in the end. But this isn't the endgame just yet, we still have a lot of potential for optimizations.