Re: WARNING in bpf_bprintf_prepare

Jiri Olsa <olsajiri@xxxxxxxxx> · Mon, 7 Nov 2022 13:31:06 +0100

On Wed, Nov 02, 2022 at 03:28:47PM +0100, Jiri Olsa wrote:
> On Thu, Oct 27, 2022 at 07:45:16PM +0800, Hao Sun wrote:
> > Jiri Olsa <olsajiri@xxxxxxxxx> 于2022年10月27日周四 19:24写道：
> > >
> > > On Thu, Oct 27, 2022 at 10:27:28AM +0800, Hao Sun wrote:
> > > > Hi,
> > > >
> > > > The following warning can be triggered with the C reproducer in the link.
> > > > Syzbot also reported this several days ago, Jiri posted a patch that
> > > > uses bpf prog `active` field to fix this by 05b24ff9b2cfab (bpf:
> > > > Prevent bpf program recursion...) according to syzbot dashboard
> > > > (https://syzkaller.appspot.com/bug?id=179313fb375161d50a98311a28b8e2fc5f7350f9).
> > > > But this warning can still be triggered on 247f34f7b803
> > > > (Linux-v6.1-rc2) that already merged the patch, so it seems that this
> > > > still is an issue.
> > > >
> > > > HEAD commit: 247f34f7b803 Linux 6.1-rc2
> > > > git tree: upstream
> > > > console output: https://pastebin.com/raw/kNw8JCu5
> > > > kernel config: https://pastebin.com/raw/sE5QK5HL
> > > > C reproducer: https://pastebin.com/raw/X96ASi27
> > >
> > > hi,
> > > right, that fix addressed that issue for single bpf program,
> > > and it won't prevent if there are multiple programs hook on
> > > contention_begin tracepoint and calling bpf_trace_printk,
> > >
> > > I'm not sure we can do something there.. will check
> > >
> > > do you run just the reproducer, or you load the server somehow?
> > > I cannot hit the issue so far
> > >
> > 
> > Hi,
> > 
> > Last email has format issues, resend it here.
> > 
> > I built the kernel with the config in the link, which contains
> > “CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0
> > sysctl.kernel.hung_task_all_cpu_backtrace=1 panic_on_warn=1 …”, and
> > boot the kernel with normal qemu setup and then the warning can be
> > triggered by executing the reproducer.
> > 
> > Also, I’m willing to test the proposed patch if any.
> 
> fyi I reproduced that.. will check if we can do anything about that

I reproduced this with set of 8 programs all hooked to contention_begin
tracepoint and here's what I think is happening:

all programs (prog1 .. prog8) call just bpf_trace_printk helper and I'm
running 'perf bench sched messaging' to load the machine

at some point some contended lock triggers trace_contention_begin:

  trace_contention_begin
    __traceiter_contention_begin                                <-- iterates all functions attached to tracepoint
      __bpf_trace_run(prog1)
        prog1->active = 1
        bpf_prog_run(prog1)
          bpf_trace_printk
            bpf_bprintf_prepare                                 <-- takes buffer 1 out of 3
            raw_spin_lock_irqsave(trace_printk_lock)

              # we have global single trace_printk_lock, so we will trigger
              # its trace_contention_begin at some point

              trace_contention_begin
                __traceiter_contention_begin
                  __bpf_trace_run(prog1)
                    prog1->active block                         <-- prog1 is already 'running', skipping the execution
                  __bpf_trace_run(prog2)
                    prog2->active = 1
                    bpf_prog_run(prog2)
                      bpf_trace_printk
                        bpf_bprintf_prepare                     <-- takes buffer 2 out of 3
                        raw_spin_lock_irqsave(trace_printk_lock)
                          trace_contention_begin
                            __traceiter_contention_begin
                              __bpf_trace_run(prog1)
                                prog1->active block             <-- prog1 is already 'running', skipping the execution
                              __bpf_trace_run(prog2)
                                prog2->active block             <-- prog2 is already 'running', skipping the execution
                              __bpf_trace_run(prog3)
                                 prog3->active = 1
                                 bpf_prog_run(prog3)
                                   bpf_trace_printk
                                     bpf_bprintf_prepare        <-- takes buffer 3 out of 3
                                     raw_spin_lock_irqsave(trace_printk_lock)
                                       trace_contention_begin
                                         __traceiter_contention_begin
                                           __bpf_trace_run(prog1)
                                             prog1->active block      <-- prog1 is already 'running', skipping the execution
                                           __bpf_trace_run(prog2)
                                             prog2->active block      <-- prog2 is already 'running', skipping the execution
                                           __bpf_trace_run(prog3)
                                             prog3->active block      <-- prog3 is already 'running', skipping the execution
                                           __bpf_trace_run(prog4)
                                             prog4->active = 1
                                             bpf_prog_run(prog4)
                                               bpf_trace_printk
                                                 bpf_bprintf_prepare  <-- tries to take buffer 4 out of 3 -> WARNING


the code path may vary based on the contention of the trace_printk_lock,
so I saw different nesting within 8 programs, but all eventually ended up
at 4 levels of nesting and hit the warning

I think we could perhaps move the 'active' flag protection from program
to the tracepoint level (in the patch below), to prevent nesting execution
of the same tracepoint, so it'd look like:

  trace_contention_begin
    __traceiter_contention_begin
      __bpf_trace_run(prog1) {
        contention_begin.active = 1
        bpf_prog_run(prog1)
          bpf_trace_printk
            bpf_bprintf_prepare
            raw_spin_lock_irqsave(trace_printk_lock)
              trace_contention_begin
                __traceiter_contention_begin
                  __bpf_trace_run(prog1)
                    blocked because contention_begin.active == 1
                  __bpf_trace_run(prog2)
                    blocked because contention_begin.active == 1
                  __bpf_trace_run(prog3)
                  ...
                  __bpf_trace_run(prog8)
                    blocked because contention_begin.active == 1

            raw_spin_unlock_irqrestore
            bpf_bprintf_cleanup

        contention_begin.active = 0
      }

      __bpf_trace_run(prog2) {
        contention_begin.active = 1
        bpf_prog_run(prog2)
          ...
        contention_begin.active = 0
      }

do we need bpf program execution in nested tracepoints?
we could actually allow 3 nesting levels for this case.. thoughts?

thanks,
jirka


---

diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index 6a13220d2d27..5a354ae096e5 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -78,11 +78,15 @@
 #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
 
 #define __BPF_DECLARE_TRACE(call, proto, args)				\
+static DEFINE_PER_CPU(int, __bpf_trace_tp_active_##call);		\
 static notrace void							\
 __bpf_trace_##call(void *__data, proto)					\
 {									\
 	struct bpf_prog *prog = __data;					\
-	CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));	\
+									\
+	if (likely(this_cpu_inc_return(__bpf_trace_tp_active_##call) == 1))		\
+		CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));	\
+	this_cpu_dec(__bpf_trace_tp_active_##call);					\
 }
 
 #undef DECLARE_EVENT_CLASS