Re: [PATCH v2] seccomp: passthrough uretprobe systemcall without filtering

Jiri Olsa <olsajiri@xxxxxxxxx> · Thu, 30 Jan 2025 09:24:43 +0100



On Wed, Jan 29, 2025 at 09:27:49AM -0800, Eyal Birger wrote:
> Hi,
> 
> Thanks for the review!
> 
> On Tue, Jan 28, 2025 at 5:41 PM Kees Cook <kees@xxxxxxxxxx> wrote:
> >
> > On Tue, Jan 28, 2025 at 06:58:06AM -0800, Eyal Birger wrote:
> > > Note: uretprobe isn't supported in i386 and __NR_ia32_rt_tgsigqueueinfo
> > > uses the same number as __NR_uretprobe so the syscall isn't forced in the
> > > compat bitmap.
> >
> > So a 64-bit tracer cannot use uretprobe on a 32-bit process? Also is
> > uretprobe strictly an x86_64 feature?
> >
> 
> My understanding is that they'd be able to do so, but use the int3 trap
> instead of the uretprobe syscall.
> 
> > > [...]
> > > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > > index 385d48293a5f..23b594a68bc0 100644
> > > --- a/kernel/seccomp.c
> > > +++ b/kernel/seccomp.c
> > > @@ -734,13 +734,13 @@ seccomp_prepare_user_filter(const char __user *user_filter)
> > >
> > >  #ifdef SECCOMP_ARCH_NATIVE
> > >  /**
> > > - * seccomp_is_const_allow - check if filter is constant allow with given data
> > > + * seccomp_is_filter_const_allow - check if filter is constant allow with given data
> > >   * @fprog: The BPF programs
> > >   * @sd: The seccomp data to check against, only syscall number and arch
> > >   *      number are considered constant.
> > >   */
> > > -static bool seccomp_is_const_allow(struct sock_fprog_kern *fprog,
> > > -                                struct seccomp_data *sd)
> > > +static bool seccomp_is_filter_const_allow(struct sock_fprog_kern *fprog,
> > > +                                       struct seccomp_data *sd)
> > >  {
> > >       unsigned int reg_value = 0;
> > >       unsigned int pc;
> > > @@ -812,6 +812,21 @@ static bool seccomp_is_const_allow(struct sock_fprog_kern *fprog,
> > >       return false;
> > >  }
> > >
> > > +static bool seccomp_is_const_allow(struct sock_fprog_kern *fprog,
> > > +                                struct seccomp_data *sd)
> > > +{
> > > +#ifdef __NR_uretprobe
> > > +     if (sd->nr == __NR_uretprobe
> > > +#ifdef SECCOMP_ARCH_COMPAT
> > > +         && sd->arch != SECCOMP_ARCH_COMPAT
> > > +#endif
> >
> > I don't like this because it's not future-proof enough. __NR_uretprobe
> > may collide with other syscalls at some point.
> 
> I'm not sure I got this point.
> 
> > And if __NR_uretprobe_32
> > is ever implemented, the seccomp logic will be missing. I think this
> > will work now and in the future:
> >
> > #ifdef __NR_uretprobe
> > # ifdef SECCOMP_ARCH_COMPAT
> >         if (sd->arch == SECCOMP_ARCH_COMPAT) {
> > #  ifdef __NR_uretprobe_32
> >                 if (sd->nr == __NR_uretprobe_32)
> >                         return true;
> > #  endif
> >         } else
> > # endif
> >         if (sd->nr == __NR_uretprobe)
> >                 return true;
> > #endif
> 
> I don't know if implementing uretprobe syscall for compat binaries is
> planned or makes sense - I'd appreciate Jiri's and others opinion on that.
> That said, I don't mind adding this code for the sake of future proofing.

as Andrii wrote in the other email ATM it's just strictly x86_64,
but let's future proof it

AFAIK there was an attempt to do similar on arm but it did not show
any speed up

> 
> >
> > Instead of doing a function rename dance, I think you can just stick
> > the above into seccomp_is_const_allow() after the WARN().
> 
> My motivation for the renaming dance was that you mentioned we might add
> new syscalls to this as well, so I wanted to avoid cluttering the existing
> function which seems to be well defined.
> 
> >
> > Also please add a KUnit tests to cover this in
> > tools/testing/selftests/seccomp/seccomp_bpf.c
> 
> I think this would mean that this test suite would need to run as
> privileged. Is that Ok? or maybe it'd be better to have a new suite?
> 
> > With at least these cases combinations below. Check each of:
> >
> >         - not using uretprobe passes
> >         - using uretprobe passes (and validates that uretprobe did work)
> >
> > in each of the following conditions:
> >
> >         - default-allow filter
> >         - default-block filter
> >         - filter explicitly blocking __NR_uretprobe and nothing else
> >         - filter explicitly allowing __NR_uretprobe (and only other
> >           required syscalls)
> 
> Ok.

please let me know if I can help in any way with tests

> 
> >
> > Hm, is uretprobe expected to work on mips? Because if so, you'll need to
> > do something similar to the mode1 checking in the !SECCOMP_ARCH_NATIVE
> > version of seccomp_cache_check_allow().
> 
> I don't know if uretprobe syscall is expected to run on mips. Personally
> I'd avoid adding this dead code.
> 
> >
> > (You can see why I really dislike having policy baked into seccomp!)
> 
> I definitely understand :)
> 
> >
> > > +        )
> > > +             return true;
> > > +#endif
> > > +
> > > +     return seccomp_is_filter_const_allow(fprog, sd);
> > > +}
> > > +
> > >  static void seccomp_cache_prepare_bitmap(struct seccomp_filter *sfilter,
> > >                                        void *bitmap, const void *bitmap_prev,
> > >                                        size_t bitmap_size, int arch)
> > > @@ -1023,6 +1038,9 @@ static inline void seccomp_log(unsigned long syscall, long signr, u32 action,
> > >   */
> > >  static const int mode1_syscalls[] = {
> > >       __NR_seccomp_read, __NR_seccomp_write, __NR_seccomp_exit, __NR_seccomp_sigreturn,
> > > +#ifdef __NR_uretprobe
> > > +     __NR_uretprobe,
> > > +#endif
> >
> > It'd be nice to update mode1_syscalls_32 with __NR_uretprobe_32 even
> > though it doesn't exist. (Is it _never_ planned to be implemented?) But
> > then, maybe the chances of a compat mode1 seccomp process running under
> > uretprobe is vanishingly small.

no plans for __NR_uretprobe_32 at this point

> 
> It seems to me very unlikely. BTW, when I tested the "strict" mode change
> my program was killed by seccomp. The reason wasn't the uretprobe syscall
> (which I added to the list), it was actually the exit_group syscall which
> libc uses instead of the exit syscall.
> 
> Thanks again,
> Eyal.

thanks,
jirka