On Wed, 26 May 2021 at 18:29, Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: > On 5/26/21 4:52 PM, Marco Elver wrote: > > Due to some moving around of code, the patch lost the actual fix (using > > atomically read io_wq) -- so here it is again ... hopefully as intended. > > :-) > > "fortify" damn it... It was synchronised with &ctx->uring_lock > before, see io_uring_try_cancel_iowq() and io_uring_del_tctx_node(), > so should not clear before *del_tctx_node() Ah, so if I understand right, the property stated by the comment in io_uring_try_cancel_iowq() was broken, and your patch below would fix that, right? > The fix should just move it after this sync point. Will you send > it out as a patch? Do you mean your move of write to io_wq goes on top of the patch I proposed? (If so, please also leave your Signed-of-by so I can squash it.) So if I understand right, we do in fact have 2 problems: 1. the data race as I noted in my patch, and 2. the fact that io_wq does not live long enough. Did I get it right? Thanks, -- Marco > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 7db6aaf31080..b76ba26b4c6c 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -9075,11 +9075,12 @@ static void io_uring_clean_tctx(struct io_uring_task *tctx) > struct io_tctx_node *node; > unsigned long index; > > - tctx->io_wq = NULL; > xa_for_each(&tctx->xa, index, node) > io_uring_del_tctx_node(index); > - if (wq) > + if (wq) { > + tctx->io_wq = NULL; > io_wq_put_and_exit(wq); > + } > } > > static s64 tctx_inflight(struct io_uring_task *tctx, bool tracked) > > > -- > Pavel Begunkov