On 3/21/21 9:18 AM, Eric W. Biederman wrote: > Jens Axboe <axboe@xxxxxxxxx> writes: > >> On 3/20/21 4:08 PM, Eric W. Biederman wrote: >>> >>> Added criu because I just realized that io_uring (which can open files >>> from an io worker thread) looks to require some special handling for >>> stopping and freezing processes. If not in the SIGSTOP case in the >>> related cgroup freezer case. >>> >>> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: >>> >>>> On Sat, Mar 20, 2021 at 10:51 AM Linus Torvalds >>>> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: >>>>> >>>>> Alternatively, make it not use >>>>> CLONE_SIGHAND|CLONE_THREAD at all, but that would make it >>>>> unnecessarily allocate its own signal state, so that's "cleaner" but >>>>> not great either. >>>> >>>> Thinking some more about that, it would be problematic for things like >>>> the resource counters too. They'd be much better shared. >>>> >>>> Not adding it to the thread list etc might be clever, but feels a bit too scary. >>>> >>>> So on the whole I think Jens' minor patches to just not have IO helper >>>> threads accept signals are probably the right thing to do. >>> >>> The way I see it we have two options: >>> >>> 1) Don't ask PF_IO_WORKERs to stop do_signal_stop and in >>> task_join_group_stop. >>> >>> The easiest comprehensive implementation looks like just >>> updating task_set_jobctl_pending to treat PF_IO_WORKER >>> as it treats PF_EXITING. >>> >>> 2) Have the main loop of the kernel thread test for JOBCTL_STOP_PENDING >>> and call into do_signal_stop. >>> >>> It is a wee bit trickier to modify the io_workers to stop, but it does >>> not look prohibitively difficult. >>> >>> All of the work performed by the io worker is work scheduled via >>> io_uring by the process being stopped. >>> >>> - Is the amount of work performed by the io worker thread sufficiently >>> negligible that we don't care? >>> >>> - Or is the amount of work performed by the io worker so great that it >>> becomes a way for an errant process to escape SIGSTOP? >>> >>> As the code is all intermingled with the cgroup_freezer. I am also >>> wondering creating checkpoints needs additional stopping guarantees. >> >> The work done is the same a syscall, basically. So it could be long >> running and essentially not doing anything (eg read from a socket, no >> data is there), or it's pretty short lived (eg read from a file, just >> waiting on DMA). >> >> This is outside of my domain of expertise, which is exactly why I added >> you and Linus to make some calls on what the best approach here would >> be. My two patches obviously go route #1 in terms of STOP. And fwiw, >> I tested this: >> >>> To solve the issue that SIGSTOP is simply broken right now I am totally >>> fine with something like: >>> >>> diff --git a/kernel/signal.c b/kernel/signal.c >>> index ba4d1ef39a9e..cb9acdfb32fa 100644 >>> --- a/kernel/signal.c >>> +++ b/kernel/signal.c >>> @@ -288,7 +288,8 @@ bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask) >>> JOBCTL_STOP_SIGMASK | JOBCTL_TRAPPING)); >>> BUG_ON((mask & JOBCTL_TRAPPING) && !(mask & JOBCTL_PENDING_MASK)); >>> >>> - if (unlikely(fatal_signal_pending(task) || (task->flags & PF_EXITING))) >>> + if (unlikely(fatal_signal_pending(task) || >>> + (task->flags & (PF_EXITING | PF_IO_WORKER)))) >>> return false; >>> >>> if (mask & JOBCTL_STOP_SIGMASK) >> >> and can confirm it works fine for me with 2/2 reverted and this applied >> instead. >> >>> Which just keeps from creating unstoppable processes today. I am just >>> not convinced that is what we want as a long term solution. >> >> How about we go with either my 2/2 or yours above to at least ensure we >> don't leave workers looping as schedule() is a nop with sigpending? If >> there's a longer timeline concern that "evading" SIGSTOP is a concern, I >> have absolutely no qualms with making the IO threads participate. But >> since it seems conceptually simple but with potentially lurking minor >> issues, probably not the ideal approach for right now. > > > Here is the signoff for mine. > > Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> > > Yours misses the joining of group stop during fork. So we better use > mine. I've updated it and attributed it to you, here is is: https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.12&id=4db4b1a0d1779dc159f7b87feb97030ec0b12597 > As far as I can see that fixes the outstanding bugs. Great! > Jens can you make a proper patch out of it and send it to Linus for > -rc4? I unfortunately have other commitments and this is all I can do > for today. Will do - I'm going to sanity run the current branch and do a followup pull request for Linus once I've verified everything is still sane. -- Jens Axboe