From: Jens Axboe <axboe@xxxxxxxxx> commit 04beb6e0e08c30c6f845f50afb7d7953603d7a6f upstream. If some part of the kernel adds task_work that needs executing, in terms of signaling it'll generally use TWA_SIGNAL or TWA_RESUME. Those two directly translate to TIF_NOTIFY_SIGNAL or TIF_NOTIFY_RESUME, and can be used for a variety of use case outside of task_work. However, io_cqring_wait_schedule() only tests explicitly for TIF_NOTIFY_SIGNAL. This means it can miss if task_work got added for the task, but used a different kind of signaling mechanism (or none at all). Normally this doesn't matter as any task_work will be run once the task exits to userspace, except if: 1) The ring is setup with DEFER_TASKRUN 2) The local work item may generate normal task_work For condition 2, this can happen when closing a file and it's the final put of that file, for example. This can cause stalls where a task is waiting to make progress inside io_cqring_wait(), but there's nothing else that will wake it up. Hence change the "should we schedule or loop around" check to check for the presence of task_work explicitly, rather than just TIF_NOTIFY_SIGNAL as the mechanism. While in there, also change the ordering of what type of task_work first in terms of ordering, to both make it consistent with other task_work runs in io_uring, but also to better handle the case of defer task_work generating normal task_work, like in the above example. Reported-by: Jan Hendrik Farr <kernel@xxxxxxxx> Link: https://github.com/axboe/liburing/issues/1235 Cc: stable@xxxxxxxxxxxxxxx Fixes: 846072f16eed ("io_uring: mimimise io_cqring_wait_schedule") Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- io_uring/io_uring.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2360,7 +2360,7 @@ static inline int io_cqring_wait_schedul return 1; if (unlikely(!llist_empty(&ctx->work_llist))) return 1; - if (unlikely(test_thread_flag(TIF_NOTIFY_SIGNAL))) + if (unlikely(task_work_pending(current))) return 1; if (unlikely(task_sigpending(current))) return -EINTR; @@ -2463,9 +2463,9 @@ static int io_cqring_wait(struct io_ring * If we got woken because of task_work being processed, run it * now rather than let the caller do another wait loop. */ - io_run_task_work(); if (!llist_empty(&ctx->work_llist)) io_run_local_work(ctx, nr_wait); + io_run_task_work(); /* * Non-local task_work will be run on exit to userspace, but Patches currently in stable-queue which might be from axboe@xxxxxxxxx are queue-6.11/nbd-correct-the-maximum-value-for-discard-sectors.patch queue-6.11/block-bfq-fix-possible-uaf-for-bfqq-bic-with-merge-c.patch queue-6.11/io_uring-rw-treat-eopnotsupp-for-iocb_nowait-like-eagain.patch queue-6.11/io_uring-io-wq-do-not-allow-pinning-outside-of-cpuse.patch queue-6.11/nbd-fix-race-between-timeout-and-normal-completion.patch queue-6.11/block-bfq-fix-uaf-for-accessing-waker_bfqq-after-spl.patch queue-6.11/block-bfq-choose-the-last-bfqq-from-merge-chain-in-b.patch queue-6.11/ublk-move-zone-report-data-out-of-request-pdu.patch queue-6.11/lib-sbitmap-define-swap_lock-as-raw_spinlock_t.patch queue-6.11/block-bfq-don-t-break-merge-chain-in-bfq_split_bfqq.patch queue-6.11/block-fix-potential-invalid-pointer-dereference-in-b.patch queue-6.11/io_uring-check-for-presence-of-task_work-rather-than-tif_notify_signal.patch queue-6.11/io_uring-sqpoll-do-not-allow-pinning-outside-of-cpuset.patch queue-6.11/block-bfq-fix-procress-reference-leakage-for-bfqq-in.patch queue-6.11/io_uring-io-wq-inherit-cpuset-of-cgroup-in-io-worker.patch