Re: WARNING in bpf_bprintf_prepare

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Mon, 7 Nov 2022 12:49:01 -0800

On Mon, Nov 7, 2022 at 4:31 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
>
> On Wed, Nov 02, 2022 at 03:28:47PM +0100, Jiri Olsa wrote:
> > On Thu, Oct 27, 2022 at 07:45:16PM +0800, Hao Sun wrote:
> > > Jiri Olsa <olsajiri@xxxxxxxxx> 于2022年10月27日周四 19:24写道：
> > > >
> > > > On Thu, Oct 27, 2022 at 10:27:28AM +0800, Hao Sun wrote:
> > > > > Hi,
> > > > >
> > > > > The following warning can be triggered with the C reproducer in the link.
> > > > > Syzbot also reported this several days ago, Jiri posted a patch that
> > > > > uses bpf prog `active` field to fix this by 05b24ff9b2cfab (bpf:
> > > > > Prevent bpf program recursion...) according to syzbot dashboard
> > > > > (https://syzkaller.appspot.com/bug?id=179313fb375161d50a98311a28b8e2fc5f7350f9).
> > > > > But this warning can still be triggered on 247f34f7b803
> > > > > (Linux-v6.1-rc2) that already merged the patch, so it seems that this
> > > > > still is an issue.
> > > > >
> > > > > HEAD commit: 247f34f7b803 Linux 6.1-rc2
> > > > > git tree: upstream
> > > > > console output: https://pastebin.com/raw/kNw8JCu5
> > > > > kernel config: https://pastebin.com/raw/sE5QK5HL
> > > > > C reproducer: https://pastebin.com/raw/X96ASi27
> > > >
> > > > hi,
> > > > right, that fix addressed that issue for single bpf program,
> > > > and it won't prevent if there are multiple programs hook on
> > > > contention_begin tracepoint and calling bpf_trace_printk,
> > > >
> > > > I'm not sure we can do something there.. will check
> > > >
> > > > do you run just the reproducer, or you load the server somehow?
> > > > I cannot hit the issue so far
> > > >
> > >
> > > Hi,
> > >
> > > Last email has format issues, resend it here.
> > >
> > > I built the kernel with the config in the link, which contains
> > > “CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0
> > > sysctl.kernel.hung_task_all_cpu_backtrace=1 panic_on_warn=1 …”, and
> > > boot the kernel with normal qemu setup and then the warning can be
> > > triggered by executing the reproducer.
> > >
> > > Also, I’m willing to test the proposed patch if any.
> >
> > fyi I reproduced that.. will check if we can do anything about that
>
> I reproduced this with set of 8 programs all hooked to contention_begin
> tracepoint and here's what I think is happening:
>
> all programs (prog1 .. prog8) call just bpf_trace_printk helper and I'm
> running 'perf bench sched messaging' to load the machine
>
> at some point some contended lock triggers trace_contention_begin:
>
>   trace_contention_begin
>     __traceiter_contention_begin                                <-- iterates all functions attached to tracepoint
>       __bpf_trace_run(prog1)
>         prog1->active = 1
>         bpf_prog_run(prog1)
>           bpf_trace_printk
>             bpf_bprintf_prepare                                 <-- takes buffer 1 out of 3
>             raw_spin_lock_irqsave(trace_printk_lock)
>
>               # we have global single trace_printk_lock, so we will trigger
>               # its trace_contention_begin at some point
>
>               trace_contention_begin
>                 __traceiter_contention_begin
>                   __bpf_trace_run(prog1)
>                     prog1->active block                         <-- prog1 is already 'running', skipping the execution
>                   __bpf_trace_run(prog2)
>                     prog2->active = 1
>                     bpf_prog_run(prog2)
>                       bpf_trace_printk
>                         bpf_bprintf_prepare                     <-- takes buffer 2 out of 3
>                         raw_spin_lock_irqsave(trace_printk_lock)
>                           trace_contention_begin
>                             __traceiter_contention_begin
>                               __bpf_trace_run(prog1)
>                                 prog1->active block             <-- prog1 is already 'running', skipping the execution
>                               __bpf_trace_run(prog2)
>                                 prog2->active block             <-- prog2 is already 'running', skipping the execution
>                               __bpf_trace_run(prog3)
>                                  prog3->active = 1
>                                  bpf_prog_run(prog3)
>                                    bpf_trace_printk
>                                      bpf_bprintf_prepare        <-- takes buffer 3 out of 3
>                                      raw_spin_lock_irqsave(trace_printk_lock)
>                                        trace_contention_begin
>                                          __traceiter_contention_begin
>                                            __bpf_trace_run(prog1)
>                                              prog1->active block      <-- prog1 is already 'running', skipping the execution
>                                            __bpf_trace_run(prog2)
>                                              prog2->active block      <-- prog2 is already 'running', skipping the execution
>                                            __bpf_trace_run(prog3)
>                                              prog3->active block      <-- prog3 is already 'running', skipping the execution
>                                            __bpf_trace_run(prog4)
>                                              prog4->active = 1
>                                              bpf_prog_run(prog4)
>                                                bpf_trace_printk
>                                                  bpf_bprintf_prepare  <-- tries to take buffer 4 out of 3 -> WARNING
>
>
> the code path may vary based on the contention of the trace_printk_lock,
> so I saw different nesting within 8 programs, but all eventually ended up
> at 4 levels of nesting and hit the warning
>
> I think we could perhaps move the 'active' flag protection from program
> to the tracepoint level (in the patch below), to prevent nesting execution
> of the same tracepoint, so it'd look like:
>
>   trace_contention_begin
>     __traceiter_contention_begin
>       __bpf_trace_run(prog1) {
>         contention_begin.active = 1
>         bpf_prog_run(prog1)
>           bpf_trace_printk
>             bpf_bprintf_prepare
>             raw_spin_lock_irqsave(trace_printk_lock)
>               trace_contention_begin
>                 __traceiter_contention_begin
>                   __bpf_trace_run(prog1)
>                     blocked because contention_begin.active == 1
>                   __bpf_trace_run(prog2)
>                     blocked because contention_begin.active == 1
>                   __bpf_trace_run(prog3)
>                   ...
>                   __bpf_trace_run(prog8)
>                     blocked because contention_begin.active == 1
>
>             raw_spin_unlock_irqrestore
>             bpf_bprintf_cleanup
>
>         contention_begin.active = 0
>       }
>
>       __bpf_trace_run(prog2) {
>         contention_begin.active = 1
>         bpf_prog_run(prog2)
>           ...
>         contention_begin.active = 0
>       }
>
> do we need bpf program execution in nested tracepoints?
> we could actually allow 3 nesting levels for this case.. thoughts?
>
> thanks,
> jirka
>
>
> ---
> diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> index 6a13220d2d27..5a354ae096e5 100644
> --- a/include/trace/bpf_probe.h
> +++ b/include/trace/bpf_probe.h
> @@ -78,11 +78,15 @@
>  #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
>
>  #define __BPF_DECLARE_TRACE(call, proto, args)                         \
> +static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call);              \
>  static notrace void                                                    \
>  __bpf_trace_##call(void *__data, proto)                                        \
>  {                                                                      \
>         struct bpf_prog *prog = __data;                                 \
> -       CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));  \
> +                                                                       \
> +       if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) == 1))             \
> +               CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));  \
> +       this_cpu_dec(__bpf_trace_tp_active_##call);                                     \
>  }

This approach will hurt real use cases where
multiple and different raw_tp progs run on the same cpu.
Instead let's disallow attaching to trace_contention and
potentially any other hook with similar recursion properties.
Another option is to add a recursion check to trace_contention itself.