The patchset optimises registered files and buffers updates / removals, The rsrc-update-bench test showes 11x improvement (1040K -> 11468K updates / sec). It also improves latency by eliminating rcu grace period waiting and bouncing it to another worker, and reduces memory footprint by removing percpu refs. That's quite important for apps updating files/buffers with medium or higher frequency as updates are slow and expensive, and it currently takes quite a number of IO requests per update to make using fixed files/buffers worthwhile. Another upside is that it makes it simpler, patch 9 removes very convoluted synchronisation via flush_delayed_work() from the quiesce path. v2: rebase, add patches 12 and 13 to remove the last pair atomics out of the path and to limit caching. Pavel Begunkov (13): io_uring/rsrc: use non-pcpu refcounts for nodes io_uring/rsrc: keep cached refs per node io_uring: don't put nodes under spinlocks io_uring: io_free_req() via tw io_uring/rsrc: protect node refs with uring_lock io_uring/rsrc: kill rsrc_ref_lock io_uring/rsrc: rename rsrc_list io_uring/rsrc: optimise io_rsrc_put allocation io_uring/rsrc: don't offload node free io_uring/rsrc: cache struct io_rsrc_node io_uring/rsrc: add lockdep sanity checks io_uring/rsrc: optimise io_rsrc_data refcounting io_uring/rsrc: add custom limit for node caching include/linux/io_uring_types.h | 8 +- io_uring/alloc_cache.h | 6 +- io_uring/io_uring.c | 54 ++++++---- io_uring/rsrc.c | 176 ++++++++++++--------------------- io_uring/rsrc.h | 58 +++++------ 5 files changed, 136 insertions(+), 166 deletions(-) -- 2.39.1