With some tricks, we can avoid refcounting in most of the cases and so save on atomics. 1-2 are simple preparations and 3-4 are the meat. 5/5 is a hint to the compiler, which stopped to similarly optimise it as is. Jens tried out a prototype before, apparently it gave ~3% win for the default read test. Not much has changed since then, so I'd expect same result, and also hope that it should be of even greater benefit to multithreaded workloads. The previous version had a flaw, so it was decided to move all completions out of IRQ and base on that assumption. On top of io_uring-irq branch. v2: Rebase to IRQ branch and updated descriptions. Removed prep patches. The main part is split in 2: dealing with submission refs, and completion. Added 5/5. Pavel Begunkov (5): io_uring: move req_ref_get() and friends io_uring: remove req_ref_sub_and_test() io_uring: remove submission references io_uring: skip request refcounting io_uring: optimise hot path of ltimeout prep fs/io_uring.c | 173 +++++++++++++++++++++++++++----------------------- 1 file changed, 94 insertions(+), 79 deletions(-) -- 2.32.0