On 5/18/23 11:16 PM, Eric W. Biederman wrote: > Mike Christie <michael.christie@xxxxxxxxxx> writes: > >> On 5/18/23 1:28 PM, Eric W. Biederman wrote: >>> Still the big issue seems to be the way get_signal is connected into >>> these threads so that it keeps getting called. Calling get_signal after >>> a fatal signal has been returned happens nowhere else and even if we fix >>> it today it is likely to lead to bugs in the future because whoever is >>> testing and updating the code is unlikely they have a vhost test case >>> the care about. >>> >>> diff --git a/kernel/signal.c b/kernel/signal.c >>> index 8f6330f0e9ca..4d54718cad36 100644 >>> --- a/kernel/signal.c >>> +++ b/kernel/signal.c >>> @@ -181,7 +181,9 @@ void recalc_sigpending_and_wake(struct task_struct *t) >>> >>> void recalc_sigpending(void) >>> { >>> - if (!recalc_sigpending_tsk(current) && !freezing(current)) >>> + if ((!recalc_sigpending_tsk(current) && !freezing(current)) || >>> + ((current->signal->flags & SIGNAL_GROUP_EXIT) && >>> + !__fatal_signal_pending(current))) >>> clear_thread_flag(TIF_SIGPENDING); >>> >>> } >>> @@ -1043,6 +1045,13 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type) >>> * This signal will be fatal to the whole group. >>> */ >>> if (!sig_kernel_coredump(sig)) { >>> + /* >>> + * The signal is being short circuit delivered >>> + * don't it pending. >>> + */ >>> + if (type != PIDTYPE_PID) { >>> + sigdelset(&t->signal->shared_pending, sig); >>> + >>> /* >>> * Start a group exit and wake everybody up. >>> * This way we don't have other threads >>> >> >> If I change up your patch so the last part is moved down a bit to when we set t >> like this: >> >> diff --git a/kernel/signal.c b/kernel/signal.c >> index 0ac48c96ab04..c976a80650db 100644 >> --- a/kernel/signal.c >> +++ b/kernel/signal.c >> @@ -181,9 +181,10 @@ void recalc_sigpending_and_wake(struct task_struct *t) >> >> void recalc_sigpending(void) >> { >> - if (!recalc_sigpending_tsk(current) && !freezing(current)) >> + if ((!recalc_sigpending_tsk(current) && !freezing(current)) || >> + ((current->signal->flags & SIGNAL_GROUP_EXIT) && >> + !__fatal_signal_pending(current))) >> clear_thread_flag(TIF_SIGPENDING); >> - > Can we get rid of this suggestion to recalc_sigpending. The more I look > at it the more I am convinced it is not safe. In particular I believe > it is incompatible with dump_interrupted() in fs/coredump.c With your clear_thread_flag call in vhost_worker suggestion I don't need the above chunk. > > The code in fs/coredump.c is the closest code we have to what you are > trying to do with vhost_worker after the session is killed. It also > struggles with TIF_SIGPENDING getting set. >> } >> EXPORT_SYMBOL(recalc_sigpending); >> >> @@ -1053,6 +1054,17 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type) >> signal->group_exit_code = sig; >> signal->group_stop_count = 0; >> t = p; >> + /* >> + * The signal is being short circuit delivered >> + * don't it pending. >> + */ >> + if (type != PIDTYPE_PID) { >> + struct sigpending *pending; >> + >> + pending = &t->signal->shared_pending; >> + sigdelset(&pending->signal, sig); >> + } >> + >> do { >> task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK); >> sigaddset(&t->pending.signal, SIGKILL); >> >> >> Then get_signal() works like how Oleg mentioned it should earlier. > > I am puzzled it makes a difference as t->signal and p->signal should > point to the same thing, and in fact the code would more clearly read > sigdelset(&signal->shared_pending, sig); Yeah either should work. The original patch had used t before it was set so my patch just moved it down to after we set it. I just used signal like you wrote and it works fine. > > But all of that seems minor. > >> For vhost I just need the code below which is just Linus's patch plus a call >> to get_signal() in vhost_worker() and the PF_IO_WORKER->PF_USER_WORKER change. >> >> Note that when we get SIGKILL, the vhost file_operations->release function is called via >> >> do_exit -> exit_files -> put_files_struct -> close_files >> >> and so the vhost release function starts to flush IO and stop the worker/vhost >> task. In vhost_worker() then we just handle those last completions for already >> running IO. When the vhost release function detects they are done it does >> vhost_task _stop() and vhost_worker() returns and then vhost_task_fn() does do_exit(). >> So we don't return immediately when get_signal() returns non-zero. >> >> So it works, but it sounds like you don't like vhost relying on the behavior, >> and it's non standard to use get_signal() like we are. So I'm not sure how we >> want to proceed. > > Let me clarify my concern. > > Your code modifies get_signal as: > /* > - * PF_IO_WORKER threads will catch and exit on fatal signals > + * PF_USER_WORKER threads will catch and exit on fatal signals > * themselves. They have cleanup that must be performed, so > * we cannot call do_exit() on their behalf. > */ > - if (current->flags & PF_IO_WORKER) > + if (current->flags & PF_USER_WORKER) > goto out; > /* > * Death signals, no core dump. > */ > do_group_exit(ksig->info.si_signo); > /* NOTREACHED */ > > Which means by modifying get_signal you are logically deleting the > do_group_exit from get_signal. As far as that goes that is a perfectly > reasonable change. The problem is you wind up calling get_signal again > after that. That does not make sense. > > I would suggest doing something like: I see. I've run some tests today and what you suggested for vhost_worker and your signal change and it works for SIGKILL/STOP/CONT and freeze. > > What is the diff below? It does not appear to a revert diff. It was just the most simple patch that was needed with your signal changes (and the PF_IO_WORKER -> PF_USER_WORKER signal change) to fix the 2 regressions reported. I wanted to give the vhost devs an idea of what was needed with your signal changes. Let me do some more testing over the weekend and I'll post a RFC with your signal change and the minimal changes needed to vhost to handle the 2 regressions that were reported. The vhost developers can get a better idea of what needs to be done and they can better decide what they want to do to proceed. _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization