On 12/28/19 4:15 AM, Pavel Begunkov wrote: > On 28/12/2019 14:13, Pavel Begunkov wrote: >> percpu_ref_tryget() has its own overhead. Instead getting a reference >> for each request, grab a bunch once per io_submit_sqes(). >> >> ~5% throughput boost for a "submit and wait 128 nops" benchmark. >> >> Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx> >> --- >> fs/io_uring.c | 26 +++++++++++++++++--------- >> 1 file changed, 17 insertions(+), 9 deletions(-) >> >> diff --git a/fs/io_uring.c b/fs/io_uring.c >> index 7fc1158bf9a4..404946080e86 100644 >> --- a/fs/io_uring.c >> +++ b/fs/io_uring.c >> @@ -1080,9 +1080,6 @@ static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx, >> gfp_t gfp = GFP_KERNEL | __GFP_NOWARN; >> struct io_kiocb *req; >> >> - if (!percpu_ref_tryget(&ctx->refs)) >> - return NULL; >> - >> if (!state) { >> req = kmem_cache_alloc(req_cachep, gfp); >> if (unlikely(!req)) >> @@ -1141,6 +1138,14 @@ static void io_free_req_many(struct io_ring_ctx *ctx, void **reqs, int *nr) >> } >> } >> >> +static void __io_req_free_empty(struct io_kiocb *req) > > If anybody have better naming (or a better approach at all), I'm all ears. __io_req_do_free()? I think that's better than the empty, not quite sure what that means. If you're fine with that, I can just make that edit when applying. The rest looks fine to me now. -- Jens Axboe