Jens Axboe <axboe@xxxxxxxxx> writes: > On 3/20/21 4:08 PM, Eric W. Biederman wrote: >> >> Added criu because I just realized that io_uring (which can open files >> from an io worker thread) looks to require some special handling for >> stopping and freezing processes. If not in the SIGSTOP case in the >> related cgroup freezer case. >> >> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: >> >>> On Sat, Mar 20, 2021 at 10:51 AM Linus Torvalds >>> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: >>>> >>>> Alternatively, make it not use >>>> CLONE_SIGHAND|CLONE_THREAD at all, but that would make it >>>> unnecessarily allocate its own signal state, so that's "cleaner" but >>>> not great either. >>> >>> Thinking some more about that, it would be problematic for things like >>> the resource counters too. They'd be much better shared. >>> >>> Not adding it to the thread list etc might be clever, but feels a bit too scary. >>> >>> So on the whole I think Jens' minor patches to just not have IO helper >>> threads accept signals are probably the right thing to do. >> >> The way I see it we have two options: >> >> 1) Don't ask PF_IO_WORKERs to stop do_signal_stop and in >> task_join_group_stop. >> >> The easiest comprehensive implementation looks like just >> updating task_set_jobctl_pending to treat PF_IO_WORKER >> as it treats PF_EXITING. >> >> 2) Have the main loop of the kernel thread test for JOBCTL_STOP_PENDING >> and call into do_signal_stop. >> >> It is a wee bit trickier to modify the io_workers to stop, but it does >> not look prohibitively difficult. >> >> All of the work performed by the io worker is work scheduled via >> io_uring by the process being stopped. >> >> - Is the amount of work performed by the io worker thread sufficiently >> negligible that we don't care? >> >> - Or is the amount of work performed by the io worker so great that it >> becomes a way for an errant process to escape SIGSTOP? >> >> As the code is all intermingled with the cgroup_freezer. I am also >> wondering creating checkpoints needs additional stopping guarantees. > > The work done is the same a syscall, basically. So it could be long > running and essentially not doing anything (eg read from a socket, no > data is there), or it's pretty short lived (eg read from a file, just > waiting on DMA). > > This is outside of my domain of expertise, which is exactly why I added > you and Linus to make some calls on what the best approach here would > be. My two patches obviously go route #1 in terms of STOP. And fwiw, > I tested this: > >> To solve the issue that SIGSTOP is simply broken right now I am totally >> fine with something like: >> >> diff --git a/kernel/signal.c b/kernel/signal.c >> index ba4d1ef39a9e..cb9acdfb32fa 100644 >> --- a/kernel/signal.c >> +++ b/kernel/signal.c >> @@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask) >> JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING)); >> BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK)); >> >> - if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING))) >> + if (unlikely(fatal_signal_pending(task) || >> + (task->flags & (PF_EXITING | PF_IO_WORKER)))) >> return false; >> >> if (mask & JOBCTL_STOP_SIGMASK) > > and can confirm it works fine for me with 2/2 reverted and this applied > instead. > >> Which just keeps from creating unstoppable processes today. I am just >> not convinced that is what we want as a long term solution. > > How about we go with either my 2/2 or yours above to at least ensure we > don't leave workers looping as schedule() is a nop with sigpending? If > there's a longer timeline concern that "evading" SIGSTOP is a concern, I > have absolutely no qualms with making the IO threads participate. But > since it seems conceptually simple but with potentially lurking minor > issues, probably not the ideal approach for right now. Here is the signoff for mine. Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> Yours misses the joining of group stop during fork. So we better use mine. As far as I can see that fixes the outstanding bugs. Jens can you make a proper patch out of it and send it to Linus for -rc4? I unfortunately have other commitments and this is all I can do for today. Eric