There is an unlikely but possible race using a freed context. That's because req->task_work.func() can free a request, but we won't necessarily find a completion in submit_state.comp and so all ctx refs may be put by the time we do mutex_lock(&ctx->uring_ctx); There are several reasons why it can miss going through submit_state.comp: 1) req->task_work.func() didn't complete it itself, but punted to iowq (e.g. reissue) and it got freed later, or a similar situation with it overflowing and getting flushed by someone else, or being submitted to IRQ completion, 2) As we don't hold the uring_lock, someone else can do io_submit_flush_completions() and put our ref. 3) Bugs and code obscurities, e.g. failing to propagate issue_flags properly. One example is as follows CPU1 | CPU2 ======================================================================= @req->task_work.func() | -> @req overflwed, | so submit_state.comp,nr==0 | | flush overflows, and free @req | ctx refs == 0, free it ctx is dead, but we do | lock + flush + unlock | So take a ctx reference for each new ctx we see in __tctx_task_work(), and do release it until we do all our flushing. Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx> --- fs/io_uring.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index d0ca0b819f1c..365e75b53a78 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1822,6 +1822,9 @@ static bool __tctx_task_work(struct io_uring_task *tctx) req = container_of(node, struct io_kiocb, io_task_work.node); this_ctx = req->ctx; + if (this_ctx != ctx) + percpu_ref_get(&this_ctx->refs); + req->task_work.func(&req->task_work); node = next; @@ -1831,14 +1834,18 @@ static bool __tctx_task_work(struct io_uring_task *tctx) mutex_lock(&ctx->uring_lock); io_submit_flush_completions(&ctx->submit_state.comp, ctx); mutex_unlock(&ctx->uring_lock); + percpu_ref_put(&ctx->refs); ctx = node ? this_ctx : NULL; } } - if (ctx && ctx->submit_state.comp.nr) { - mutex_lock(&ctx->uring_lock); - io_submit_flush_completions(&ctx->submit_state.comp, ctx); - mutex_unlock(&ctx->uring_lock); + if (ctx) { + if (ctx->submit_state.comp.nr) { + mutex_lock(&ctx->uring_lock); + io_submit_flush_completions(&ctx->submit_state.comp, ctx); + mutex_unlock(&ctx->uring_lock); + } + percpu_ref_put(&ctx->refs); } return list.first != NULL; -- 2.24.0