On 9/26/21 10:48 AM, Hao Xu wrote: > 在 2021/9/15 下午6:48, Hao Xu 写道: >> 在 2021/9/15 下午5:44, Pavel Begunkov 写道: >>> On 9/12/21 5:23 PM, Hao Xu wrote: >>>> For multishot mode, there may be cases like: >>>> io_poll_task_func() >>>> -> add_wait_queue() >>>> async_wake() >>>> ->io_req_task_work_add() >>>> this one mess up the running task_work list >>>> since req->io_task_work.node is in use. >>>> >>>> similar situation for req->io_task_work.fallback_node. >>>> Fix it by set node->next = NULL before we run the tw, so that when we >>>> add req back to the wait queue in middle of tw running, we can safely >>>> re-add it to the tw list. >>> >>> It may get screwed before we get to "node->next = NULL;", >>> >>> -> async_wake() >>> -> io_req_task_work_add() >>> -> async_wake() >>> -> io_req_task_work_add() >>> tctx_task_work() >> True, this may happen if there is second poll wait entry. >> This pacth is for single wait entry only.. >> I'm thinking about the second poll entry issue, would be in a separate >> patch. > hmm, reviewed this email again and now I think I got what you were > saying, do you mean the second async_wake() triggered before we removed > the wait entry in the first async_wake(), like > > async_wake > async_wake > ->del wait entry Looks we had different problems in mind, let's move the conversation to the new thread with resent patches >>>> Fixes: 7cbf1722d5fc ("io_uring: provide FIFO ordering for task_work") >>>> Signed-off-by: Hao Xu <haoxu@xxxxxxxxxxxxxxxxx> >>>> --- >>>> >>>> fs/io_uring.c | 11 ++++++++--- >>>> 1 file changed, 8 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>>> index 30d959416eba..c16f6be3d46b 100644 >>>> --- a/fs/io_uring.c >>>> +++ b/fs/io_uring.c >>>> @@ -1216,13 +1216,17 @@ static void io_fallback_req_func(struct work_struct *work) >>>> struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, >>>> fallback_work.work); >>>> struct llist_node *node = llist_del_all(&ctx->fallback_llist); >>>> - struct io_kiocb *req, *tmp; >>>> + struct io_kiocb *req; >>>> bool locked = false; >>>> percpu_ref_get(&ctx->refs); >>>> - llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node) >>>> + req = llist_entry(node, struct io_kiocb, io_task_work.fallback_node); >>>> + while (member_address_is_nonnull(req, io_task_work.fallback_node)) { >>>> + node = req->io_task_work.fallback_node.next; >>>> + req->io_task_work.fallback_node.next = NULL; >>>> req->io_task_work.func(req, &locked); >>>> - >>>> + req = llist_entry(node, struct io_kiocb, io_task_work.fallback_node); >>>> + } >>>> if (locked) { >>>> if (ctx->submit_state.compl_nr) >>>> io_submit_flush_completions(ctx); >>>> @@ -2126,6 +2130,7 @@ static void tctx_task_work(struct callback_head *cb) >>>> locked = mutex_trylock(&ctx->uring_lock); >>>> percpu_ref_get(&ctx->refs); >>>> } >>>> + node->next = NULL; >>>> req->io_task_work.func(req, &locked); >>>> node = next; >>>> } while (node); >>>> >>> > -- Pavel Begunkov