On 3/26/21 8:45 AM, Stefan Metzmacher wrote: > Am 26.03.21 um 15:43 schrieb Stefan Metzmacher: >> Am 26.03.21 um 15:38 schrieb Jens Axboe: >>> On 3/26/21 7:59 AM, Jens Axboe wrote: >>>> On 3/26/21 7:54 AM, Jens Axboe wrote: >>>>>> The KILL after STOP deadlock still exists. >>>>> >>>>> In which tree? Sounds like you're still on the old one with that >>>>> incremental you sent, which wasn't complete. >>>>> >>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL? >>>>> >>>>> No, it should kill up in all cases. I'll try your stop + kill, I just >>>>> tested both of them separately and didn't observe anything. I also ran >>>>> your io_uring-cp example (and found a bug in the example, fixed and >>>>> pushed), fwiw. >>>> >>>> I can reproduce this one! I'll take a closer look. >>> >>> OK, that one is actually pretty straight forward - we rely on cleaning >>> up on exit, but for fatal cases, get_signal() will call do_exit() for us >>> and never return. So we might need a special case in there to deal with >>> that, or some other way of ensuring that fatal signal gets processed >>> correctly for IO threads. >> >> And if (fatal_signal_pending(current)) doesn't prevent get_signal() from being called? > > Ah, we're still in the first get_signal() from SIGSTOP, correct? Yes exactly, we're waiting in there being stopped. So we either need to check to something ala: relock: + if (current->flags & PF_IO_WORKER && fatal_signal_pending(current)) + return false; to catch it upfront and from the relock case, or add: fatal: + if (current->flags & PF_IO_WORKER) + return false; to catch it in the fatal section. -- Jens Axboe