Optimise percpu_ref_tryget() by not calling it for each request, but batching it. This gave a measurable ~5% performance boost for large QD. v2: fix uncommited plug (Jens Axboe) better comments for percpu_ref_tryget_many (Dennis Zhou) amortise across io_uring_enter() boundary v3: drop "batching across syscalls" remove error handling in io_submit_sqes() from common path v4: fix error handling Pavel Begunkov (2): pcpu_ref: add percpu_ref_tryget_many() io_uring: batch getting pcpu references fs/io_uring.c | 26 +++++++++++++++++--------- include/linux/percpu-refcount.h | 26 +++++++++++++++++++++----- 2 files changed, 38 insertions(+), 14 deletions(-) -- 2.24.0