Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > On Mon, May 15, 2023 at 3:23 PM Mike Christie > <michael.christie@xxxxxxxxxx> wrote: >> >> The vhost layer really doesn't want any signals and wants to work like kthreads >> for that case. To make it really simple can we do something like this where it >> separates user and io worker behavior where the major diff is how they handle >> signals and exit. I also included a fix for the freeze case: > > I don't love the SIGKILL special case, but I also don't find this > deeply offensive. So if this is what it takes, I'm ok with it. > > I wonder if we could make that special case simply check for "is > SIGKILL blocked" instead? No normal case will cause that, and it means > that a PF_USER_WORKER thread could decide per-thread what it wants to > do wrt SIGKILL. A kernel thread can block SIGKILL and that is supported. For a thread that is part of a process you can't block SIGKILL when the task is part of a user mode process. There is this bit in complete_signal when SIGKILL is delivered to any thread in the process. /* * Start a group exit and wake everybody up. * This way we don't have other threads * running and doing things after a slower * thread has the fatal signal pending. */ signal->flags = SIGNAL_GROUP_EXIT; signal->group_exit_code = sig; signal->group_stop_count = 0; t = p; do { task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); sigaddset(&t->pending.signal, SIGKILL); signal_wake_up(t, 1); } while_each_thread(p, t); For clarity that sigaddset(&t->pending.signal, SIGKILL); Really isn't setting SIGKILL pending, it is part of the short circuit delivery logic, and that sigaddset(SIGKILL) is just setting a flag to tell the process it needs to die. The important part of that code is that SIGNAL_GROUP_EXIT gets set. That indicates the entire process is being torn down. Where this becomes important is exit_notify and release_task work together to ensure that the first thread in the process (a user space thread that can not block SIGKILL) will not send SIGCHLD to it's parent process until every thread in the process has exited. The delay_group_leader logic in wait_consider_task part of wait(2) has the same logic. Having been through this with io_uring the threads really need to call get_signal to handle that case. This is pretty much why I said at the outset you they needed to decided if they were going to implement a thread or if they were going to be a process. Changing the decision to be a thread from a process is fine but in that case the vhost logic needs to act like a process, just like io_uring does. > Christian? And I guess we should Cc: Oleg too, since the signal parts > are an area he's familiar with and has worked on.. Eric Biederman has > already been on the list and has also been involved > > Oleg: see > > https://lore.kernel.org/lkml/122b597e-a5fa-daf7-27bb-6f04fa98d496@xxxxxxxxxx/ > > for the context here. Eric _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization