Re: [PATCH bpf] bpf,perf: Fix perf_event_detach_bpf_prog error handling

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 9 Dec 2024 09:49:01 -0800

On Fri, Dec 6, 2024 at 4:22 PM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
>
> On Fri, Dec 06, 2024 at 10:21:18AM -0800, Andrii Nakryiko wrote:
> > On Fri, Dec 6, 2024 at 9:09 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
> > >
> > > On Wed, Oct 23, 2024 at 09:01:02AM -0700, Andrii Nakryiko wrote:
> > > > On Wed, Oct 23, 2024 at 3:01 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
> > > > >
> > > > > Peter reported that perf_event_detach_bpf_prog might skip to release
> > > > > the bpf program for -ENOENT error from bpf_prog_array_copy.
> > > > >
> > > > > This can't happen because bpf program is stored in perf event and is
> > > > > detached and released only when perf event is freed.
> > > > >
> > > > > Let's make it obvious and add WARN_ON_ONCE on the -ENOENT check and
> > > > > make sure the bpf program is released in any case.
> > > > >
> > > > > Cc: Sean Young <sean@xxxxxxxx>
> > > > > Fixes: 170a7e3ea070 ("bpf: bpf_prog_array_copy() should return -ENOENT if exclude_prog not found")
> > > > > Closes: https://lore.kernel.org/lkml/20241022111638.GC16066@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> > > > > Reported-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > > > > Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx>
> > > > > ---
> > > > >  kernel/trace/bpf_trace.c | 5 +++--
> > > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > > > index 95b6b3b16bac..2c064ba7b0bd 100644
> > > > > --- a/kernel/trace/bpf_trace.c
> > > > > +++ b/kernel/trace/bpf_trace.c
> > > > > @@ -2216,8 +2216,8 @@ void perf_event_detach_bpf_prog(struct perf_event *event)
> > > > >
> > > > >         old_array = bpf_event_rcu_dereference(event->tp_event->prog_array);
> > > > >         ret = bpf_prog_array_copy(old_array, event->prog, NULL, 0, &new_array);
> > > > > -       if (ret == -ENOENT)
> > > > > -               goto unlock;
> > > > > +       if (WARN_ON_ONCE(ret == -ENOENT))
> > > > > +               goto put;
> > > > >         if (ret < 0) {
> > > > >                 bpf_prog_array_delete_safe(old_array, event->prog);
> > > >
> > > > seeing
> > > >
> > > > if (ret < 0)
> > > >     bpf_prog_array_delete_safe(old_array, event->prog);
> > > >
> > > > I think neither ret == -ENOENT nor WARN_ON_ONCE is necessary,  tbh. So
> > > > now I feel like just dropping WARN_ON_ONCE() is better.
> > >
> > > hi,
> > > there's syzbot report [1] where we could end up with following
> > >
> > >   - create perf event and set bpf program to it
> > >   - clone process -> create inherited event
> > >   - exit -> release both events
> > >   - first perf_event_detach_bpf_prog call will release tp_event->prog_array
> > >     and second perf_event_detach_bpf_prog will crash because
> > >     tp_event->prog_array is NULL
> > >
> > > we can fix that quicly with change below, I guess we could add refcount
> > > to bpf_prog_array_item and allow one of the parent/inherited events to
> > > work while the other is gone.. but that might be too much, will check
> > >
> > > jirka
> > >
> > >
> > > [1] https://lore.kernel.org/bpf/Z1MR6dCIKajNS6nU@krava/T/#m91dbf0688221ec7a7fc95e896a7ef9ff93b0b8ad
> > > ---
> > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > index fe57dfbf2a86..d4b45543ebc2 100644
> > > --- a/kernel/trace/bpf_trace.c
> > > +++ b/kernel/trace/bpf_trace.c
> > > @@ -2251,6 +2251,8 @@ void perf_event_detach_bpf_prog(struct perf_event *event)
> > >                 goto unlock;
> > >
> > >         old_array = bpf_event_rcu_dereference(event->tp_event->prog_array);
> > > +       if (!old_array)
> > > +               goto put;
> >
> > How does this inherited event stuff work? You can have two separate
> > events sharing the same prog_array? What if we attach different
> > programs to each of those events, will both of them be called for
> > either of two events? That sounds broken, if that's true.
>
> so perf event with attr.inherit=1 attached on task will get inherited
> by child process.. the new child event shares the parent's bpf program
> and tp_event (hence prog_array) which is global for tracepoint
>
> AFAICS when child process exits the inherited event is destroyed and it
> removes related tp_event->prog_array, so the parent event won't trigger
> ever again, the test below shows that
>

Doesn't this sound broken? Either event inheritance has to copy
prog_array and make them completely independent. Or inherited event
shouldn't remove the parent's program. Or something else, but the way
it is right now seems wrong, no?

I'm not sure what's the most appropriate behavior that would match
overall perf_event inheritance, but we should probably think about
this and fix it, instead of patching up the symptom with that NULL
check, no?

>   test_tp_attach:FAIL:executed unexpected executed: actual 1 != expected 2
>
> I'm not sure this is problem in practise, because nobody complained
> about that ;-)

That's... not really a distinction of what is a problem or not ;)

>
> libbpf does not set attr.inherit=1 and creates system wide perf event,
> so no problem there

you can use all this outside of libbpf and lead to wrong behavior, so
worth thinking about this and fixing, IMO

>
> jirka
>
>
> ---
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 66173ddb5a2d..2e96241b5030 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -12430,8 +12430,9 @@ static int perf_event_open_tracepoint(const char *tp_category,
>         attr.type = PERF_TYPE_TRACEPOINT;
>         attr.size = attr_sz;
>         attr.config = tp_id;
> +       attr.inherit = 1;
>
> -       pfd = syscall(__NR_perf_event_open, &attr, -1 /* pid */, 0 /* cpu */,
> +       pfd = syscall(__NR_perf_event_open, &attr, 0 /* pid */, 0 /* cpu */,
>                       -1 /* group_fd */, PERF_FLAG_FD_CLOEXEC);
>         if (pfd < 0) {
>                 err = -errno;
> diff --git a/tools/testing/selftests/bpf/prog_tests/tp_attach.c b/tools/testing/selftests/bpf/prog_tests/tp_attach.c
> new file mode 100644
> index 000000000000..01bbf1d1ab52
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/tp_attach.c
> @@ -0,0 +1,35 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <test_progs.h>
> +#include "tp_attach.skel.h"
> +
> +void test_tp_attach(void)
> +{
> +       struct tp_attach *skel;
> +       int pid;
> +
> +       skel = tp_attach__open_and_load();
> +       if (!ASSERT_OK_PTR(skel, "tp_attach__open_and_load"))
> +               return;
> +
> +       skel->bss->pid = getpid();
> +
> +       if (!ASSERT_OK(tp_attach__attach(skel), "tp_attach__attach"))
> +               goto out;
> +
> +       getpid();
> +
> +       pid = fork();
> +       if (!ASSERT_GE(pid, 0, "fork"))
> +               goto out;
> +       if (pid == 0)
> +               _exit(0);
> +       waitpid(pid, NULL, 0);
> +
> +       getpid();
> +
> +       ASSERT_EQ(skel->bss->executed, 2, "executed");
> +
> +out:
> +       tp_attach__destroy(skel);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/tp_attach.c b/tools/testing/selftests/bpf/progs/tp_attach.c
> new file mode 100644
> index 000000000000..d9450d2eac17
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/tp_attach.c
> @@ -0,0 +1,17 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <vmlinux.h>
> +#include <bpf/bpf_tracing.h>
> +
> +char _license[] SEC("license") = "GPL";
> +
> +int pid;
> +int executed;
> +
> +SEC("tp/syscalls/sys_enter_getpid")
> +int test(void *ctx)
> +{
> +       if (pid == (bpf_get_current_pid_tgid() >> 32))
> +               executed++;
> +       return 0;
> +}