> I'd really appreciate if you can try one more. I want to know why > the final cleanup doesn't cope with it. yeah sure, which kernel version? it seems to be that this patch doesn't match io_uring-5.11 and io_uring-5.10 On Sun, 20 Dec 2020 at 15:22, Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: > > On 20/12/2020 13:00, Pavel Begunkov wrote: > > On 20/12/2020 07:13, Josef wrote: > >>> Guys, do you share rings between processes? Explicitly like sending > >>> io_uring fd over a socket, or implicitly e.g. sharing fd tables > >>> (threads), or cloning with copying fd tables (and so taking a ref > >>> to a ring). > >> > >> no in netty we don't share ring between processes > >> > >>> In other words, if you kill all your io_uring applications, does it > >>> go back to normal? > >> > >> no at all, the io-wq worker thread is still running, I literally have > >> to restart the vm to go back to normal(as far as I know is not > >> possible to kill kernel threads right?) > >> > >>> Josef, can you test the patch below instead? Following Jens' idea it > >>> cancels more aggressively when a task is killed or exits. It's based > >>> on [1] but would probably apply fine to for-next. > >> > >> it works, I run several tests with eventfd read op async flag enabled, > >> thanks a lot :) you are awesome guys :) > > > > Thanks for testing and confirming! Either we forgot something in > > io_ring_ctx_wait_and_kill() and it just can't cancel some requests, > > or we have a dependency that prevents release from happening. > > > > BTW, apparently that patch causes hangs for unrelated but known > > reasons, so better to not use it, we'll merge something more stable. > > I'd really appreciate if you can try one more. I want to know why > the final cleanup doesn't cope with it. > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 941fe9b64fd9..d38fc819648e 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -8614,6 +8614,10 @@ static int io_remove_personalities(int id, void *p, void *data) > return 0; > } > > +static void io_cancel_defer_files(struct io_ring_ctx *ctx, > + struct task_struct *task, > + struct files_struct *files); > + > static void io_ring_exit_work(struct work_struct *work) > { > struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, > @@ -8627,6 +8631,8 @@ static void io_ring_exit_work(struct work_struct *work) > */ > do { > io_iopoll_try_reap_events(ctx); > + io_poll_remove_all(ctx, NULL, NULL); > + io_kill_timeouts(ctx, NULL, NULL); > } while (!wait_for_completion_timeout(&ctx->ref_comp, HZ/20)); > io_ring_ctx_free(ctx); > } > @@ -8641,6 +8647,7 @@ static void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) > io_cqring_overflow_flush(ctx, true, NULL, NULL); > mutex_unlock(&ctx->uring_lock); > > + io_cancel_defer_files(ctx, NULL, NULL); > io_kill_timeouts(ctx, NULL, NULL); > io_poll_remove_all(ctx, NULL, NULL); > > -- > Pavel Begunkov -- Josef