Optimise percpu_ref_tryget() by not calling it for each request, but batching it. This gave a measurable ~5% performance boost for large QD. v2: fix uncommited plug (Jens Axboe) better comments for percpu_ref_tryget_many (Dennis Zhou) amortise across io_uring_enter() boundary Pavel Begunkov (3): pcpu_ref: add percpu_ref_tryget_many() io_uring: batch getting pcpu references io_uring: batch get(ctx->ref) across submits fs/io_uring.c | 29 ++++++++++++++++++++++++++--- include/linux/percpu-refcount.h | 26 +++++++++++++++++++++----- 2 files changed, 47 insertions(+), 8 deletions(-) -- 2.24.0