[ 736.982891] INFO: task iou-sqp-4294:4295 blocked for more than 122 seconds. [ 736.982897] Call Trace: [ 736.982901] schedule+0x68/0xe0 [ 736.982903] io_uring_cancel_sqpoll+0xdb/0x110 [ 736.982908] io_sqpoll_cancel_cb+0x24/0x30 [ 736.982911] io_run_task_work_head+0x28/0x50 [ 736.982913] io_sq_thread+0x4e3/0x720 We call io_uring_cancel_sqpoll() one by one for each ctx either in sq_thread() itself or via task works, and it's intended to cancel all requests of a specified context. However the function uses per-task counters to track the number of inflight requests, so it counts more requests than available via currect io_uring ctx and goes to sleep for them to appear (e.g. from IRQ), that will never happen. Reported-by: Joakim Hassila <joj@xxxxxxx> Reported-by: Jens Axboe <axboe@xxxxxxxxx> Fixes: 37d1e2e3642e2 ("io_uring: move SQPOLL thread io-wq forked worker") Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx> --- fs/io_uring.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index dff34975d86b..c1c843b044c0 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -9000,10 +9000,11 @@ static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx) WARN_ON_ONCE(!sqd || ctx->sq_data->thread != current); + percpu_ref_switch_to_atomic_sync(&ctx->refs); atomic_inc(&tctx->in_idle); do { /* read completions before cancelations */ - inflight = tctx_inflight(tctx); + inflight = percpu_ref_atomic_count(&ctx->refs); if (!inflight) break; io_uring_try_cancel_requests(ctx, current, NULL); @@ -9014,7 +9015,7 @@ static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx) * avoids a race where a completion comes in before we did * prepare_to_wait(). */ - if (inflight == tctx_inflight(tctx)) + if (inflight == percpu_ref_atomic_count(&ctx->refs)) schedule(); finish_wait(&tctx->wait, &wait); } while (1); -- 2.24.0