Re: [PATCH bpf v2] bpf: Fix prog_array UAF in __uprobe_perf_func()

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 9 Dec 2024 14:14:27 -0800

On Mon, Dec 9, 2024 at 10:22 AM Jann Horn <jannh@xxxxxxxxxx> wrote:
>
> On Sat, Dec 7, 2024 at 12:15 AM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> > On Fri, Dec 6, 2024 at 3:14 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
> > > On Fri, Dec 6, 2024 at 11:43 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
> > > > On Fri, Dec 6, 2024 at 11:30 PM Andrii Nakryiko
> > > > <andrii.nakryiko@xxxxxxxxx> wrote:
> > > > > On Fri, Dec 6, 2024 at 2:25 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Fri, Dec 6, 2024 at 11:15 PM Andrii Nakryiko
> > > > > > <andrii.nakryiko@xxxxxxxxx> wrote:
> > > > > > > On Fri, Dec 6, 2024 at 12:45 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > Currently, the pointer stored in call->prog_array is loaded in
> > > > > > > > __uprobe_perf_func(), with no RCU annotation and no RCU protection, so the
> > > > > > > > loaded pointer can immediately be dangling. Later,
> > > > > > > > bpf_prog_run_array_uprobe() starts a RCU-trace read-side critical section,
> > > > > > > > but this is too late. It then uses rcu_dereference_check(), but this use of
> > > > > > > > rcu_dereference_check() does not actually dereference anything.
> > > > > > > >
> > > > > > > > It looks like the intention was to pass a pointer to the member
> > > > > > > > call->prog_array into bpf_prog_run_array_uprobe() and actually dereference
> > > > > > > > the pointer in there. Fix the issue by actually doing that.
> > > > > > > >
> > > > > > > > Fixes: 8c7dcb84e3b7 ("bpf: implement sleepable uprobes by chaining gps")
> > > > > > > > Cc: stable@xxxxxxxxxxxxxxx
> > > > > > > > Signed-off-by: Jann Horn <jannh@xxxxxxxxxx>
> > > > > > > > ---
> > > > > > > > To reproduce, in include/linux/bpf.h, patch in a mdelay(10000) directly
> > > > > > > > before the might_fault() in bpf_prog_run_array_uprobe() and add an
> > > > > > > > include of linux/delay.h.
> > > > > > > >
> > > > > > > > Build this userspace program:
> > > > > > > >
> > > > > > > > ```
> > > > > > > > $ cat dummy.c
> > > > > > > > #include <stdio.h>
> > > > > > > > int main(void) {
> > > > > > > >   printf("hello world\n");
> > > > > > > > }
> > > > > > > > $ gcc -o dummy dummy.c
> > > > > > > > ```
> > > > > > > >
> > > > > > > > Then build this BPF program and load it (change the path to point to
> > > > > > > > the "dummy" binary you built):
> > > > > > > >
> > > > > > > > ```
> > > > > > > > $ cat bpf-uprobe-kern.c
> > > > > > > > #include <linux/bpf.h>
> > > > > > > > #include <bpf/bpf_helpers.h>
> > > > > > > > #include <bpf/bpf_tracing.h>
> > > > > > > > char _license[] SEC("license") = "GPL";
> > > > > > > >
> > > > > > > > SEC("uprobe//home/user/bpf-uprobe-uaf/dummy:main")
> > > > > > > > int BPF_UPROBE(main_uprobe) {
> > > > > > > >   bpf_printk("main uprobe triggered\n");
> > > > > > > >   return 0;
> > > > > > > > }
> > > > > > > > $ clang -O2 -g -target bpf -c -o bpf-uprobe-kern.o bpf-uprobe-kern.c
> > > > > > > > $ sudo bpftool prog loadall bpf-uprobe-kern.o uprobe-test autoattach
> > > > > > > > ```
> > > > > > > >
> > > > > > > > Then run ./dummy in one terminal, and after launching it, run
> > > > > > > > `sudo umount uprobe-test` in another terminal. Once the 10-second
> > > > > > > > mdelay() is over, a use-after-free should occur, which may or may
> > > > > > > > not crash your kernel at the `prog->sleepable` check in
> > > > > > > > bpf_prog_run_array_uprobe() depending on your luck.
> > > > > > > > ---
> > > > > > > > Changes in v2:
> > > > > > > > - remove diff chunk in patch notes that confuses git
> > > > > > > > - Link to v1: https://lore.kernel.org/r/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@xxxxxxxxxx
> > > > > > > > ---
> > > > > > > >  include/linux/bpf.h         | 4 ++--
> > > > > > > >  kernel/trace/trace_uprobe.c | 2 +-
> > > > > > > >  2 files changed, 3 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > >
> > > > > > > Looking at how similar in spirit bpf_prog_run_array() is meant to be
> > > > > > > used, it seems like it is the caller's responsibility to
> > > > > > > RCU-dereference array and keep RCU critical section before calling
> > > > > > > into bpf_prog_run_array(). So I wonder if it's best to do this instead
> > > > > > > (Gmail will butcher the diff, but it's about the idea):
> > > > > >
> > > > > > Yeah, that's the other option I was considering. That would be more
> > > > > > consistent with the existing bpf_prog_run_array(), but has the
> > > > > > downside of unnecessarily pushing responsibility up to the caller...
> > > > > > I'm fine with either.
> > > > >
> > > > > there is really just one caller ("legacy" singular uprobe handler), so
> > > > > I think this should be fine. Unless someone objects I'd keep it
> > > > > consistent with other "prog_array_run" helpers
> > > >
> > > > Ack, I will make it consistent.
> > > >
> > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > > index eaee2a819f4c..4b8a9edd3727 100644
> > > > > > > --- a/include/linux/bpf.h
> > > > > > > +++ b/include/linux/bpf.h
> > > > > > > @@ -2193,26 +2193,25 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
> > > > > > >   * rcu-protected dynamically sized maps.
> > > > > > >   */
> > > > > > >  static __always_inline u32
> > > > > > > -bpf_prog_run_array_uprobe(const struct bpf_prog_array __rcu *array_rcu,
> > > > > > > +bpf_prog_run_array_uprobe(const struct bpf_prog_array *array,
> > > > > > >                           const void *ctx, bpf_prog_run_fn run_prog)
> > > > > > >  {
> > > > > > >         const struct bpf_prog_array_item *item;
> > > > > > >         const struct bpf_prog *prog;
> > > > > > > -       const struct bpf_prog_array *array;
> > > > > > >         struct bpf_run_ctx *old_run_ctx;
> > > > > > >         struct bpf_trace_run_ctx run_ctx;
> > > > > > >         u32 ret = 1;
> > > > > > >
> > > > > > >         might_fault();
> > > > > > > +       RCU_LOCKDEP_WARN(!rcu_read_lock_trace_held(), "no rcu lock held");
> > > > > > > +
> > > > > > > +       if (unlikely(!array))
> > > > > > > +               goto out;
> > > > > > >
> > > > > > > -       rcu_read_lock_trace();
> > > > > > >         migrate_disable();
> > > > > > >
> > > > > > >         run_ctx.is_uprobe = true;
> > > > > > >
> > > > > > > -       array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
> > > > > > > -       if (unlikely(!array))
> > > > > > > -               goto out;
> > > > > > >         old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > > > > > >         item = &array->items[0];
> > > > > > >         while ((prog = READ_ONCE(item->prog))) {
> > > > > > > @@ -2229,7 +2228,6 @@ bpf_prog_run_array_uprobe(const struct
> > > > > > > bpf_prog_array __rcu *array_rcu,
> > > > > > >         bpf_reset_run_ctx(old_run_ctx);
> > > > > > >  out:
> > > > > > >         migrate_enable();
> > > > > > > -       rcu_read_unlock_trace();
> > > > > > >         return ret;
> > > > > > >  }
> > > > > > >
> > > > > > > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > > > > > > index fed382b7881b..87a2b8fefa90 100644
> > > > > > > --- a/kernel/trace/trace_uprobe.c
> > > > > > > +++ b/kernel/trace/trace_uprobe.c
> > > > > > > @@ -1404,7 +1404,9 @@ static void __uprobe_perf_func(struct trace_uprobe *tu,
> > > > > > >         if (bpf_prog_array_valid(call)) {
> > > > > > >                 u32 ret;
> > > > > > >
> > > > > > > +               rcu_read_lock_trace();
> > > > > > >                 ret = bpf_prog_run_array_uprobe(call->prog_array,
> > > > > > > regs, bpf_prog_run);
> > > > > >
> > > > > > But then this should be something like this (possibly split across
> > > > > > multiple lines with a helper variable or such):
> > > > > >
> > > > > > ret = bpf_prog_run_array_uprobe(rcu_dereference_check(call->prog_array,
> > > > > > rcu_read_lock_trace_held()), regs, bpf_prog_run);
> > > > >
> > > > > Yeah, absolutely, forgot to move the RCU dereference part, good catch!
> > > > > But I wouldn't do the _check() variant here, literally the previous
> > > > > line does rcu_read_trace_lock(), so this check part seems like just
> > > > > unnecessary verboseness, I'd go with a simple rcu_dereference().
> > > >
> > > > rcu_dereference() is not legal there - that asserts that we are in a
> > > > normal RCU read-side critical section, which we are not.
> > > > rcu_dereference_raw() would be, but I think it is nice to document the
> > > > semantics to make it explicit under which lock we're operating.
> >
> > sure, I don't mind
> >
> > > >
> > > > I'll send a v3 in a bit after testing it.
> > >
> > > Actually, now I'm still hitting a page fault with my WIP v3 fix
> > > applied... I'll probably poke at this some more next week.
> >
> > OK, that's interesting, keep us posted!
>
> If I replace the "uprobe/" in my reproducer with "uprobe.s/", the
> reproducer stops crashing even on bpf/master without this fix -
> because it happens that handle_swbp() is already holding a
> rcu_read_lock_trace() lock way up the stack. So I think this fix
> should still be applied, but it probably doesn't need to go into
> stable unless there is another path to the buggy code that doesn't
> come from handle_swbp(). I guess I probably should resend my patch
> with an updated commit message pointing out this caveat?
>
> The problem I'm actually hitting seems to be a use-after-free of a
> "struct bpf_prog" because of mismatching RCU flavors. Uprobes always
> use bpf_prog_run_array_uprobe() under tasks-trace-RCU protection. But
> it is possible to attach a non-sleepable BPF program to a uprobe, and
> non-sleepable BPF programs are freed via normal RCU (see
> __bpf_prog_put_noref()). And that is what happens with the reproducer
> from my initial post
> (https://lore.kernel.org/all/20241206-bpf-fix-uprobe-uaf-v1-1-6869c8a17258@xxxxxxxxxx/)
> - I can see that __bpf_prog_put_noref runs with prog->sleepable==0.
>
> So I think that while I am delaying execution in
> bpf_prog_run_array_uprobe(), perf_event_detach_bpf_prog() NULLs out
> the event->tp_event->prog_array pointer and does
> bpf_prog_array_free_sleepable() followed by bpf_prog_put(), and then
> the BPF program can be freed since the reader doesn't hold an RCU read
> lock. This seems a bit annoying to fix - there could legitimately be
> several versions of the bpf_prog_array that are still used by
> tasks-trace-RCU readers, so I think we can't just NULL out the array
> entry and use RCU for the bpf_prog_array access on the reader side. I
> guess we could add another flag on BPF programs that answers "should
> this program be freed via tasks-trace-RCU" (separately from whether
> the program is sleepable)?

Yes, we shouldn't NULL anything out.

This is the same issue we've been solving recently for sleepable
tracepoints, see [0] and other patches in the same patch set. We
solved it for sleepable (aka "faultable") tracepoints, but uprobes
have the same problem where the attachment point is sleepable by
nature (and thus protected by RCU Tasks Trace), but BPF program itself
is non-sleepable (and thus we only wait for synchronize_rcu() before
freeing), which causes a disconnect.

We can easily fix this for BPF link-based uprobes, but legacy uprobes
can be directly attached to perf event, so that's a bit more
cumbersome. Let me think what should be the best way to handle this.

Meanwhile, I agree, please send your original fix (with changes we
discussed), it's good to have them, even if they don't fix your
original issue. I'll CC you on fixes once I have them.

  [0] https://lore.kernel.org/all/20241101181754.782341-1-andrii@xxxxxxxxxx/