On 5/26/21 5:36 PM, Marco Elver wrote: > On Wed, 26 May 2021 at 18:29, Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: >> On 5/26/21 4:52 PM, Marco Elver wrote: >>> Due to some moving around of code, the patch lost the actual fix (using >>> atomically read io_wq) -- so here it is again ... hopefully as intended. >>> :-) >> >> "fortify" damn it... It was synchronised with &ctx->uring_lock >> before, see io_uring_try_cancel_iowq() and io_uring_del_tctx_node(), >> so should not clear before *del_tctx_node() > > Ah, so if I understand right, the property stated by the comment in > io_uring_try_cancel_iowq() was broken, and your patch below would fix > that, right? "io_uring: fortify tctx/io_wq cleanup" broke it and the diff should fix it. >> The fix should just move it after this sync point. Will you send >> it out as a patch? > > Do you mean your move of write to io_wq goes on top of the patch I > proposed? (If so, please also leave your Signed-of-by so I can squash > it.) No, only my diff, but you hinted on what has happened, so I would prefer you to take care of patching. If you want of course. To be entirely fair, assuming that aligned ptr reads can't be torn, I don't see any _real_ problem. But surely the report is very helpful and the current state is too wonky, so should be patched. TL;DR; The synchronisation goes as this: it's usually used by the owner task, and the owner task deletes it, so is mostly naturally synchronised. An exception is a worker (not only) that accesses it for cancellation purpose, but it uses it only under ->uring_lock, so if removal is also taking the lock it should be fine. see io_uring_del_tctx_node() locking. > > So if I understand right, we do in fact have 2 problems: > 1. the data race as I noted in my patch, and Yes, and it deals with it > 2. the fact that io_wq does not live long enough. Nope, io_wq outlives them fine. > Did I get it right? > >> diff --git a/fs/io_uring.c b/fs/io_uring.c >> index 7db6aaf31080..b76ba26b4c6c 100644 >> --- a/fs/io_uring.c >> +++ b/fs/io_uring.c >> @@ -9075,11 +9075,12 @@ static void io_uring_clean_tctx(struct io_uring_task *tctx) >> struct io_tctx_node *node; >> unsigned long index; >> >> - tctx->io_wq = NULL; >> xa_for_each(&tctx->xa, index, node) >> io_uring_del_tctx_node(index); >> - if (wq) >> + if (wq) { >> + tctx->io_wq = NULL; >> io_wq_put_and_exit(wq); >> + } >> } >> >> static s64 tctx_inflight(struct io_uring_task *tctx, bool tracked) -- Pavel Begunkov