Re: [PATCH bpf-next v1 4/9] bpf: Introduce sleepable tracepoints

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Thu, 3 Mar 2022 11:59:19 -0800

On Thu, Mar 3, 2022 at 11:37 AM Hao Luo <haoluo@xxxxxxxxxx> wrote:
>
> On Wed, Mar 2, 2022 at 11:41 AM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Fri, Feb 25, 2022 at 03:43:34PM -0800, Hao Luo wrote:
> > > diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
> > > index e7c2276be33e..c73c7ab3680e 100644
> > > --- a/include/linux/tracepoint-defs.h
> > > +++ b/include/linux/tracepoint-defs.h
> > > @@ -51,6 +51,7 @@ struct bpf_raw_event_map {
> > >       void                    *bpf_func;
> > >       u32                     num_args;
> > >       u32                     writable_size;
> > > +     u32                     sleepable;
> >
> > It increases the size for all tracepoints.
> > See BPF_RAW_TP in include/asm-generic/vmlinux.lds.h
> > Please switch writeable_size and sleepable to u16.
>
> No problem.
>
> > >
> > > -static const struct bpf_func_proto *
> > > -syscall_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> > > +/* Syscall helpers that are also allowed in sleepable tracing prog. */
> > > +const struct bpf_func_proto *
> > > +tracing_prog_syscall_func_proto(enum bpf_func_id func_id,
> > > +                             const struct bpf_prog *prog)
> > >  {
> > >       switch (func_id) {
> > >       case BPF_FUNC_sys_bpf:
> > >               return &bpf_sys_bpf_proto;
> > > -     case BPF_FUNC_btf_find_by_name_kind:
> > > -             return &bpf_btf_find_by_name_kind_proto;
> > >       case BPF_FUNC_sys_close:
> > >               return &bpf_sys_close_proto;
> > > -     case BPF_FUNC_kallsyms_lookup_name:
> > > -             return &bpf_kallsyms_lookup_name_proto;
> > >       case BPF_FUNC_mkdir:
> > >               return &bpf_mkdir_proto;
> > >       case BPF_FUNC_rmdir:
> > >               return &bpf_rmdir_proto;
> > >       case BPF_FUNC_unlink:
> > >               return &bpf_unlink_proto;
> > > +     default:
> > > +             return NULL;
> > > +     }
> > > +}
> >
> > If I read this correctly the goal is to disallow find_by_name_kind
> > and lookup_name from sleepable tps. Why? What's the harm?
>
> A couple of thoughts, please correct me if they don't make sense. I
> may think too much.
>
> 1. The very first reason is, I don't know the use case of them in
> tracing. So I think I can leave them right now and add them later if
> the maintainers want them.
>
> 2. A related question is, do we actually want all syscall helpers to
> be in sleepable tracing? Some helpers may cause re-entering the
> tracepoints. For a hypothetical example, if we call mkdir while
> tracing some tracepoints in vfs_mkdir. Do we have protection for this?

If we go with noinline weak nop function approach then we will
get recursion protection for free. All trampoline powered progs have it.
Both sleepable and not.

> Another potential problem is about lookup_name in particular,
> sleepable_tracing could be triggered by any user, will lookup_name
> leak kernel addresses to users in some way? The filesystem helpers
> have some basic perm checks, I would think it's relatively safer.

The tracepoint may be triggerable by any user, but the sleepable
tp bpf prog will be loaded with cap_perfmon permissions, so it has
the rights to read anything.
So I don't see any concerns with enabling lookup_name to both
syscall bpf prog and tp progs.