On Mon, 2021-06-07 at 22:49 -0500, Eric W. Biederman wrote: > We added the check just a little while ago. I am surprised it shows up > in any book. What is the Bovett & Cesati book? It is an O'Reilly book named 'Understanding the Linux Kernel'. It is quite old (2005) and covers kernel 2.6 but surprisingly it did age very well and it is still very useful today. For instance, the discussed feature about exit_mm() sync mecanism, when the book has been published, the struct core_state was not existing and its members were straight into mm_struct. I would think that beside this detail, not much has changed since then in the general concepts... > > The flag PF_SIGNALED today is set in exactly one place, and that > is in get_signal. The meaning of PF_SIGNALED is that do_group_exit > was called from get_signal. AKA your task was killed by a signal. > > The check in exit_mm() that tests PF_SIGNALED is empirically testing > to see if all of the necessary state is saved on the kernel stack. > That state is the state accessed by fs/binfmt_elf.c:fill_note_info. > > The very good description from the original change can be found in > the commit 123cbec460db ("signal: Remove the helper > signal_group_exit"). thx for sharing the link. Help to improve my kernel understanding is always very appreciated. However, I am clueless about where to look to retrieve it: $ git show 123cbec460db -- fatal: bad revision '123cbec460db' is it supposed to be found in the master branch or this a commit prior 2005? > > For alpha it is has the assembly function do_switch_stack been called > before your code path was called in the kernel. > > Since io_uring does not have a userspace I don't know if testing > for PF_SIGNALED is at all meaningful to detect values saved on the > stack. > > I suspect io_uring is simply broken on architectures that need > extra state saved on the stack, but I haven't looked yet. > > > > So I am not sure if the synchronizatin MUST be applied to io_workers > > or not but the proposed patch is making sure that it is applied in > > all cases if it is needed. > > That patch is definitely wrong. If anything the check in exit_mm > should be updated. with your explanation, if the only purpose of the synchronization is to make sure that the task stack register is pointing on its kernel stack to be able to dump its user hardware context, it is not needed for io- worker tasks since they never exit the kernel. I think that this is the confirmation that I wanted to get from the patch code review because this point wasn't 100% clear to me (and I was having issues getting a core dump generated properly when using io_uring. I found the culprit of that issue. This isn't the problem described in this patch but I thought that the current problem despite not being the one responsible for my coredump issue could still have some merit). the doubt was coming from the fact that io-worker tasks are almost userspace tasks except that they never escape the kernel and the observation that it is the first type of task that could theoritically exit from a group_exit without having their PF_SIGNALED bit set. > > Can you share which code paths in io_uring exit with a > fatal_signal_pending and don't bother to call get_signal? Yes. One was provided as part of the description of this patch: io_wqe_worker() (fs/io-wq.c) and io_sq_thread() (fs/io_uring.c) is another example and possibly all the other future threads created with create_io_thread() will share the same pattern. They all enter an while loop and contain 1 or many other break points beside getting a signal. imho, it is quite exceptional situation but it could happen that those threads exit with a pending signal. Before 5.13, when io-wq had a second type of io thread (io-mgr), there was a real chance of that happening by a race condition between the io- mgr setting the wq state bit IO_WQ_BIT_EXIT upon receiving a SIGKILL and its io-workers threads exiting their while loop. but with io-mgr removal in 5.13, this risk is now gone... I have first offered a patch to io_uring to call get_signal() one last time before exiting: https://lore.kernel.org/io-uring/6b67bd40815f779059f7f3d3ad22f638789452b1.camel@xxxxxxxxxxxxxx/T/#t but discussing with Jens made me realize that it wasn't the right place to do this because: 1. Interrupts aren't disabled and the task can be preemted even after this last get_signal() call. 2. If that is something needed, it was the bad place to do it because you would have to repeat it for all the current existing io_threads and the future ones too. but again, after your explanations, this might not be required at all after all... Greetings,