On 12/3/21 4:52 AM, Florian Fischer wrote: > Hi Jens, > >> Thanks for the bug report, and I really appreciate including a reproducer. >> Makes everything so much easier to debug. > > Glad I could help :) > >> Are you able to compile your own kernels? Would be great if you can try >> and apply this one on top of 5.15.6. >> >> >> diff --git a/fs/io-wq.c b/fs/io-wq.c >> index 8c6131565754..e8f77903d775 100644 >> --- a/fs/io-wq.c >> +++ b/fs/io-wq.c >> @@ -711,6 +711,13 @@ static bool io_wq_work_match_all(struct io_wq_work *work, void *data) >> >> static inline bool io_should_retry_thread(long err) >> { >> + /* >> + * Prevent perpetual task_work retry, if the task (or its group) is >> + * exiting. >> + */ >> + if (fatal_signal_pending(current)) >> + return false; >> + >> switch (err) { >> case -EAGAIN: >> case -ERESTARTSYS: > > With your patch on top of 5.15.6 I can no longer reproduce stuck processes. > Neither with our software nor with the reproducer. > I ran both a hundred times and both terminated immediately without unexpected CPU usage. > > Tested-by: Florian Fischer <florian.fl.fischer@xxxxxx> Great, thanks for testing! -- Jens Axboe