There is no reason not to use __io_cq_unlock_post_flush for intermediate aux CQE flushing, all ->task_complete should apply there, i.e. if set it should be the submitter task. Combine them, get rid of of __io_cq_unlock_post() and rename the left function. This place was also taking a couple percents of CPU according to profiles for max throughput net benchmarks due to multishot recv flooding it with completions. Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx> --- io_uring/io_uring.c | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 70fffed83e95..1b53a2ab0a27 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -637,18 +637,7 @@ static inline void io_cq_lock(struct io_ring_ctx *ctx) spin_lock(&ctx->completion_lock); } -/* keep it inlined for io_submit_flush_completions() */ static inline void __io_cq_unlock_post(struct io_ring_ctx *ctx) -{ - io_commit_cqring(ctx); - if (!ctx->task_complete) - spin_unlock(&ctx->completion_lock); - - io_commit_cqring_flush(ctx); - io_cqring_wake(ctx); -} - -static void __io_cq_unlock_post_flush(struct io_ring_ctx *ctx) { io_commit_cqring(ctx); @@ -1568,7 +1557,7 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx) } } } - __io_cq_unlock_post_flush(ctx); + __io_cq_unlock_post(ctx); if (!wq_list_empty(&ctx->submit_state.compl_reqs)) { io_free_batch_list(ctx, state->compl_reqs.first); -- 2.40.0