On Thu, Jan 26, 2023 at 09:45:39AM -0800, Andrei Vagin wrote: > On Thu, Jan 26, 2023 at 7:07 AM Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > > > On 01/25, Andrei Vagin wrote: > > > > > > On Wed, Jan 25, 2023 at 4:30 PM Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > > > > > > > On 01/24, Gregory Price wrote: > > > > > > > > > > Adds PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH to ptrace options, and > > > > > modify Syscall User Dispatch to suspend interception when enabled. > > > > > > > > > > This is modeled after the SUSPEND_SECCOMP feature, which suspends > > > > > SECCOMP interposition. Without doing this, software like CRIU will > > > > > inject system calls into a process and be intercepted by Syscall > > > > > User Dispatch, either causing a crash (due to blocked signals) or > > > > > the delivery of those signals to a ptracer (not the intended behavior). > > > > > > > > Cough... Gregory, I am sorry ;) > > > > > > > > but can't we drop this patch to ? > > > > > > > > CRIU needs to do PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG and check > > > > config->mode anyway as we discussed. > > > > > > > > Then it can simply set *config->selector = SYSCALL_DISPATCH_FILTER_ALLOW > > > > with the same effect, no? > > > > > > Oleg, > > > > > > PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH is automatically cleared when > > > a tracer detaches. It is critical when tracers detach due to unexpected > > > reasons > > > > IIUC, PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH is needed to run the injected > > code, and this also needs to change the state of the traced process. If > > the tracer (CRIU) dies while the tracee runs this code, I guess the tracee > > will have other problems? > > Our injected code can reheal itself if something goes wrong. The hack > here is that we inject > the code with a signal frame and it calls rt_segreturn to resume the process. > > We want to have this functionality for most cases. I don't expect that > the syscall user dispatch > is used by many applications, so I don't strongly insist on > PTRACE_O_SUSPEND_SYSCALL_USER_DISPATCH. In addition, if we know a user dispatch > memory region, it can be enough to inject our code out of this region > without disabling SUD. > > Thanks, > Andrei The region is exclusive, so syscalls *outside* [offset, offset+len] produce a dispatch. That means you would have to inject into that region. That's what's problematic for injection. Even rt_sigreturn itself may/will be intercepted.