Re: [PATCH 1/3] ptrace,syscall_user_dispatch: Implement Syscall User Dispatch Suspension

Gregory Price <gregory.price@xxxxxxxxxxxx> · Wed, 18 Jan 2023 14:49:31 -0500

On Wed, Jan 18, 2023 at 02:41:00PM -0500, Gregory Price wrote:
> ---------- Forwarded message ---------
> From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Date: Wed, Jan 18, 2023 at 12:16 PM
> Subject: Re: [PATCH 1/3] ptrace,syscall_user_dispatch: Implement Syscall
> User Dispatch Suspension
> To: Gregory Price <gourry.memverge@xxxxxxxxx>
> 
> 
> On Mon, Jan 09, 2023 at 10:33:46AM -0500, Gregory Price wrote:
> > @@ -36,6 +37,10 @@ bool syscall_user_dispatch(struct pt_regs *regs)
> >       struct syscall_user_dispatch *sd = &current->syscall_dispatch;
> >       char state;
> >
> > +     if (IS_ENABLED(CONFIG_CHECKPOINT_RESTORE) &&
> > +                     unlikely(current->ptrace &
> PT_SUSPEND_SYSCALL_USER_DISPATCH))
> > +             return false;
> > +
> >       if (likely(instruction_pointer(regs) - sd->offset < sd->len))
> >               return false;
> >
> 
> So by making syscall_user_dispatch() return false, we'll make
> syscall_trace_enter() continue to handle things, and supposedly you want
> to land in ptrace_report_syscall_entry(), right?
>
> ... snip ...
> 
> Should setting this then not also depend on having
> SYSCALL_WORK_SYSCALL_TRACE set? Because without that, you get 'funny'
> things.

Hm, this is an interesting question.  My thoughts are that I want the
process to handle the syscall as-if syscall user dispatch was not
present at all, regardless of SYSCALL_TRACE.

This is because some software, like CRIU, actually injects syscalls to
run in the context of the software in an effort to collect resources.

So I actually *want* those 'funny' things to occur, because they're most
likely intentional.  I don't necessarily want to intercept system calls
that subsequently occur (although i might).

So if this feature required SYSCALL_TRACE, you would no longer be able
to inject system calls ala CRIU.

That's also my understanding of the SECCOMP_SUSPEND feature as well,
it's intended specifically to allow *otherwise disallowed* syscalls to
be injected into the process and SECCOMP bypassed. (in this case,
SECCOMP_SUSPEND requires root for exactly this reason).