On 3/26/21 8:43 AM, Stefan Metzmacher wrote: > Am 26.03.21 um 15:38 schrieb Jens Axboe: >> On 3/26/21 7:59 AM, Jens Axboe wrote: >>> On 3/26/21 7:54 AM, Jens Axboe wrote: >>>>> The KILL after STOP deadlock still exists. >>>> >>>> In which tree? Sounds like you're still on the old one with that >>>> incremental you sent, which wasn't complete. >>>> >>>>> Does io_wq_manager() exits without cleaning up on SIGKILL? >>>> >>>> No, it should kill up in all cases. I'll try your stop + kill, I just >>>> tested both of them separately and didn't observe anything. I also ran >>>> your io_uring-cp example (and found a bug in the example, fixed and >>>> pushed), fwiw. >>> >>> I can reproduce this one! I'll take a closer look. >> >> OK, that one is actually pretty straight forward - we rely on cleaning >> up on exit, but for fatal cases, get_signal() will call do_exit() for us >> and never return. So we might need a special case in there to deal with >> that, or some other way of ensuring that fatal signal gets processed >> correctly for IO threads. > > And if (fatal_signal_pending(current)) doesn't prevent get_signal() > from being called? Usually yes, but this case is first doing SIGSTOP, so we're waiting in get_signal() -> do_signal_stop() when the SIGKILL arrives. Hence there's no way to catch it in the worker themselves. -- Jens Axboe