Optimise percpu_ref_tryget() by not calling it for each request, but batching it. This gave a measurable performance boost, though with a bit unconventional(/unrealistic?) workload. There is still one step to add, which is not implemented with patchset, and will amortise the effect calls to io_uring_enter(). rebased on top of for-5.6/io_uring Pavel Begunkov (2): pcpu_ref: add percpu_ref_tryget_many() io_uring: batch getting pcpu references fs/io_uring.c | 11 ++++++++--- include/linux/percpu-refcount.h | 24 ++++++++++++++++++++---- 2 files changed, 28 insertions(+), 7 deletions(-) -- 2.24.0