On 12/2/21 9:56 AM, Florian Fischer wrote: > Hello, > > I experienced stuck tasks during a process' exit when using multiple > io_uring instances on a 48/96-core system in a multi-threaded environment, > where we use an io_uring per thread and a single pipe(2) to pass messages > between the threads. > > When the program calls exit(2) without joining the threads or unmapping/closing > the io_urings, the program gets stuck in the zombie state - sometimes leaving > behind multiple <cpu>:<n>-events kernel-threads using a considerable amount of CPU. > > I can reproduce this behavior on Debian running Linux 5.15.6 with the > reproducer below compiled with Debian's gcc (10.2.1-6): Thanks for the bug report, and I really appreciate including a reproducer. Makes everything so much easier to debug. Are you able to compile your own kernels? Would be great if you can try and apply this one on top of 5.15.6. diff --git a/fs/io-wq.c b/fs/io-wq.c index 8c6131565754..e8f77903d775 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -711,6 +711,13 @@ static bool io_wq_work_match_all(struct io_wq_work *work, void *data) static inline bool io_should_retry_thread(long err) { + /* + * Prevent perpetual task_work retry, if the task (or its group) is + * exiting. + */ + if (fatal_signal_pending(current)) + return false; + switch (err) { case -EAGAIN: case -ERESTARTSYS: -- Jens Axboe