Delay reading the next node in io_free_batch_list(), allows the compiler to load the value a bit later improving register spilling in some cases. With gcc 11.1 it helped to move @task_refs variable from the stack to a register and optimises out a couple of per request instructions. Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx> --- fs/io_uring.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 10112ea73e77..50312ac4537d 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2280,9 +2280,10 @@ static void io_free_batch_list(struct io_ring_ctx *ctx, struct io_kiocb *req = container_of(node, struct io_kiocb, comp_list); - node = req->comp_list.next; - if (!req_ref_put_and_test(req)) + if (!req_ref_put_and_test(req)) { + node = req->comp_list.next; continue; + } io_queue_next(req); io_dismantle_req(req); @@ -2294,6 +2295,7 @@ static void io_free_batch_list(struct io_ring_ctx *ctx, task_refs = 0; } task_refs++; + node = req->comp_list.next; wq_stack_add_head(&req->comp_list, &ctx->submit_state.free_list); } while (node); -- 2.33.0