On 3/26/21 9:04 AM, Stefan Metzmacher wrote: > > Am 26.03.21 um 15:53 schrieb Jens Axboe: >> On 3/26/21 8:45 AM, Stefan Metzmacher wrote: >>> Am 26.03.21 um 15:43 schrieb Stefan Metzmacher: >>>> Am 26.03.21 um 15:38 schrieb Jens Axboe: >>>>> On 3/26/21 7:59 AM, Jens Axboe wrote: >>>>>> On 3/26/21 7:54 AM, Jens Axboe wrote: >>>>>>>> The KILL after STOP deadlock still exists. >>>>>>> >>>>>>> In which tree? Sounds like you're still on the old one with that >>>>>>> incremental you sent, which wasn't complete. >>>>>>> >>>>>>>> Does io_wq_manager() exits without cleaning up on SIGKILL? >>>>>>> >>>>>>> No, it should kill up in all cases. I'll try your stop + kill, I just >>>>>>> tested both of them separately and didn't observe anything. I also ran >>>>>>> your io_uring-cp example (and found a bug in the example, fixed and >>>>>>> pushed), fwiw. >>>>>> >>>>>> I can reproduce this one! I'll take a closer look. >>>>> >>>>> OK, that one is actually pretty straight forward - we rely on cleaning >>>>> up on exit, but for fatal cases, get_signal() will call do_exit() for us >>>>> and never return. So we might need a special case in there to deal with >>>>> that, or some other way of ensuring that fatal signal gets processed >>>>> correctly for IO threads. >>>> >>>> And if (fatal_signal_pending(current)) doesn't prevent get_signal() from being called? >>> >>> Ah, we're still in the first get_signal() from SIGSTOP, correct? >> >> Yes exactly, we're waiting in there being stopped. So we either need to >> check to something ala: >> >> relock: >> + if (current->flags & PF_IO_WORKER && fatal_signal_pending(current)) >> + return false; >> >> to catch it upfront and from the relock case, or add: >> >> fatal: >> + if (current->flags & PF_IO_WORKER) >> + return false; >> >> to catch it in the fatal section. >> > > Or something like io_uring_files_cancel() > > Maybe change current->pf_io_worker with a generic current->io_thread > structure which, has exit hooks, as well as > io_wq_worker_sleeping() and io_wq_worker_running(). > > Maybe create_io_thread would take such an structure > as argument instead of a single function pointer. > > struct io_thread_description { > const char *name; > int (*thread_fn)(struct io_thread_description *); > void (*sleeping_fn)((struct io_thread_description *); > void (*running_fn)((struct io_thread_description *); > void (*exit_fn)((struct io_thread_description *); > }; > > And then > struct io_wq_manager { > struct io_thread_description description; > ... manager specific stuff... > }; I did consider something like that, but seems a bit over-engineered just for catching this case. And any kind of logic for PF_EXITING ends up being a bit tricky for cancelations. We can look into doing that for 5.13 potentially. -- Jens Axboe