With some tricks, we can avoid refcounting in most of the cases and so save on atomics. The series implements this optimisation. 1-4 are simple enough preparations, the biggest part is 5/5. Would be great to have extra pair of eyes on it. Jens tried out a prototype before, apparently it gave ~3% win for the default read test. Not much has changed since then, so I'd expect same result, and also hope that it should be of even greater benefit to multithreaded workloads. Pavel Begunkov (5): io_uring: move req_ref_get() and friends io_uring: delay freeing ->async_data io_uring: protect rsrc dealloc by uring_lock io_uring: remove req_ref_sub_and_test() io_uring: request refcounting skipping fs/io_uring.c | 176 +++++++++++++++++++++++++++++--------------------- 1 file changed, 101 insertions(+), 75 deletions(-) -- 2.32.0