Re: Tasks stuck on exit(2) with 5.15.6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/2/21 9:56 AM, Florian Fischer wrote:
> Hello,
> 
> I experienced stuck tasks during a process' exit when using multiple
> io_uring instances on a 48/96-core system in a multi-threaded environment,
> where we use an io_uring per thread and a single pipe(2) to pass messages
> between the threads.
> 
> When the program calls exit(2) without joining the threads or unmapping/closing
> the io_urings, the program gets stuck in the zombie state - sometimes leaving
> behind multiple <cpu>:<n>-events kernel-threads using a considerable amount of CPU.
> 
> I can reproduce this behavior on Debian running Linux 5.15.6 with the
> reproducer below compiled with Debian's gcc (10.2.1-6):

Thanks for the bug report, and I really appreciate including a reproducer.
Makes everything so much easier to debug.

Are you able to compile your own kernels? Would be great if you can try
and apply this one on top of 5.15.6.


diff --git a/fs/io-wq.c b/fs/io-wq.c
index 8c6131565754..e8f77903d775 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -711,6 +711,13 @@ static bool io_wq_work_match_all(struct io_wq_work *work, void *data)
 
 static inline bool io_should_retry_thread(long err)
 {
+	/*
+	 * Prevent perpetual task_work retry, if the task (or its group) is
+	 * exiting.
+	 */
+	if (fatal_signal_pending(current))
+		return false;
+
 	switch (err) {
 	case -EAGAIN:
 	case -ERESTARTSYS:

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux