On 9/27/21 7:51 AM, Eric W. Biederman wrote: > Jens Axboe <axboe@xxxxxxxxx> writes: > >> On 9/25/21 5:05 PM, Linus Torvalds wrote: >>> On Sat, Sep 25, 2021 at 1:32 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >>>> >>>> - io-wq core dump exit fix (me) >>> >>> Hmm. >>> >>> That one strikes me as odd. >>> >>> I get the feeling that if the io_uring thread needs to have that >>> signal_group_exit() test, something is wrong in signal-land. >>> >>> It's basically a "fatal signal has been sent to another thread", and I >>> really get the feeling that "fatal_signal_pending()" should just be >>> modified to handle that case too. >> >> It did surprise me as well, which is why that previous change ended up >> being broken for the coredump case... You could argue that the io-wq >> thread should just exit on signal_pending(), which is what we did >> before, but that really ends up sucking for workloads that do use >> signals for communication purposes. postgres was the reporter here. > > The primary function get_signal is to make signals not pending. So I > don't understand any use of testing signal_pending after a call to > get_signal. > > My confusion doubles when I consider the fact io_uring threads should > only be dequeuing SIGSTOP and SIGKILL. > > I am concerned that an io_uring thread that dequeues SIGKILL won't call > signal_group_exit and thus kill the other threads in the thread group. > > What motivated removing the break and adding the fatal_signal_pending > test? I played with this a bit this morning, and I agree it doesn't seem to be needed at all. The original issue was with postgres, I'll give that a whirl as well and see if we run into any unwarranted exits. My simpler test case did not. -- Jens Axboe