On 9/18/24 19:03, Jens Axboe wrote:
io_cqring_wait() doesn't run normal task_work after the local work, and
it's the only location to do it in that order. Normally this doesn't
matter, except if:
1) The ring is setup with DEFER_TASKRUN
2) The local work item may generate normal task_work
For condition 2, this can happen when closing a file and it's the final
put of that file, for example. This can cause stalls where a task is
waiting to make progress, but there's nothing else that will wake it up.
TIF_NOTIFY_SIGNAL from normal task_work should prevent the task
from sleeping until it processes task works, that should make
the waiting loop make another iteration and get to the task work
execution again (if it continues to sleep). I don't understand how
the patch works, but if it's legit sounds we have a bigger problem,
e.g. what if someone else queue up a work right after that tw
execution block.
Link: https://github.com/axboe/liburing/issues/1235
Cc: stable@xxxxxxxxxxxxxxx
Fixes: 846072f16eed ("io_uring: mimimise io_cqring_wait_schedule")
Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
---
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 1aca501efaf6..d6a2cd351525 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2568,9 +2568,9 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
* If we got woken because of task_work being processed, run it
* now rather than let the caller do another wait loop.
*/
- io_run_task_work();
if (!llist_empty(&ctx->work_llist))
io_run_local_work(ctx, nr_wait);
+ io_run_task_work();
/*
* Non-local task_work will be run on exit to userspace, but
--
Pavel Begunkov