Re: [PATCH bpf-next] bpf: emit source code file name and line number in verifier log

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 12 Feb 2024 11:01:50 -0800

On Mon, Feb 12, 2024 at 10:59 AM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Sun, Feb 11, 2024 at 11:43 AM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Fri, Feb 9, 2024 at 4:33 PM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote:
> > >
> > > As BPF applications grow in size and complexity and are separated into
> > > multiple .bpf.c files that are statically linked together, it becomes
> > > harder and harder to match verifier's BPF assembly level output to
> > > original C code. While often annotated C source code is unique enough to
> > > be able to identify the file it belongs to, quite often this is actually
> > > problematic as parts of source code can be quite generic.
> > >
> > > Long story short, it is very useful to see source code file name and
> > > line number information along with the original C code. Verifier already
> > > knows this information, we just need to output it.
> > >
> > > This patch set is an initial proposal on how this can be done. No new
> > > flags are added and file:line information is appended at the end of
> > > C code:
> > >
> > >   ; <original C code> (<filename>.bpf.c:<line>)
> > >
> > > If file name has directory names in it, they are stripped away. This
> > > should be fine in practice as file names tend to be pretty unique with
> > > C code anyways, and keeping log size smaller is always good.
> > >
> > > In practice this might look something like below, where some code is
> > > coming from application files, while others are from libbpf's usdt.bpf.h
> > > header file:
> > >
> > >   ; if (STROBEMETA_READ( (strobemeta_probe.bpf.c:534)
> > >   5592: (79) r1 = *(u64 *)(r10 -56)     ; R1_w=mem_or_null(id=1589,sz=7680) R10=fp0 fp-56_w=mem_or_null(id=1589,sz=7680)
> > >   5593: (7b) *(u64 *)(r10 -56) = r1     ; R1_w=mem_or_null(id=1589,sz=7680) R10=fp0 fp-56_w=mem_or_null(id=1589,sz=7680)
> > >   5594: (79) r3 = *(u64 *)(r10 -8)      ; R3_w=scalar() R10=fp0 fp-8=mmmmmmmm
> > >
> > >   ...
> > >
> > >   170: (71) r1 = *(u8 *)(r8 +15)        ; frame1: R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) R8_w=map_value(map=__bpf_usdt_spec,ks=4,vs=208)
> > >   171: (67) r1 <<= 56                   ; frame1: R1_w=scalar(smax=0x7f00000000000000,umax=0xff00000000000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xff00000000000000))
> > >   172: (c7) r1 s>>= 56                  ; frame1: R1_w=scalar(smin=smin32=-128,smax=smax32=127)
> > >   ; val <<= arg_spec->arg_bitshift; (usdt.bpf.h:183)
> > >   173: (67) r1 <<= 32                   ; frame1: R1_w=scalar(smax=0x7f00000000,umax=0xffffffff00000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xffffffff00000000))
> > >   174: (77) r1 >>= 32                   ; frame1: R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
> > >   175: (79) r2 = *(u64 *)(r10 -8)       ; frame1: R2_w=scalar() R10=fp0 fp-8=mmmmmmmm
> > >   176: (6f) r2 <<= r1                   ; frame1: R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R2_w=scalar()
> > >   177: (7b) *(u64 *)(r10 -8) = r2       ; frame1: R2_w=scalar(id=61) R10=fp0 fp-8_w=scalar(id=61)
> > >   ; if (arg_spec->arg_signed) (usdt.bpf.h:184)
> > >   178: (bf) r3 = r2                     ; frame1: R2_w=scalar(id=61) R3_w=scalar(id=61)
> > >   179: (7f) r3 >>= r1                   ; frame1: R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R3_w=scalar()
> > >   ; if (arg_spec->arg_signed) (usdt.bpf.h:184)
> > >   180: (71) r4 = *(u8 *)(r8 +14)
> > >   181: safe
> > >
> > > I've played with few different formats and none stood out as
> > > particularly better than other. Suggestions and votes are appreciated:
> > >
> > >   a) ; if (arg_spec->arg_signed) (usdt.bpf.h:184)
> > >   b) ; if (arg_spec->arg_signed) [usdt.bpf.h:184]
> > >   c) ; [usdt.bpf.h:184] if (arg_spec->arg_signed)
> > >   d) ; (usdt.bpf.h:184) if (arg_spec->arg_signed)
> >
> > Great idea.
> > I would drop parenthesis. Don't see a reason to wrap a text.
> > Since we already use ';' as a comment I would continue:
>
> A bit worried that it will quite weird for any source line which ends
> in ';', like:
>
> ; x = 123; ; my_file.bpf.c:123
> r1 = 123;
>
>
> E.g., here's a real excerpt (I cut some register states for cleanliness):
>
> ; if (i >= map->cnt) ; strobemeta_probe.bpf.c:396
> 5376: (79) r1 = *(u64 *)(r10 -40)     ;
> R1_w=map_value(map=raw_map_heap,ks=4,vs=264) R10=fp0
> 5377: (79) r1 = *(u64 *)(r1 +8)       ; R1_w=scalar(...) R9_w=10
> ; if (i >= map->cnt) ; strobemeta_probe.bpf.c:396
> 5378: (dd) if r1 s<= r9 goto pc-5     ; R1_w=scalar(smin=umin=11,...)
> ; descr->key_lens[i] = 0; ; strobemeta_probe.bpf.c:398
> 5379: (b4) w1 = 0                     ; R1_w=0
> 5380: (6b) *(u16 *)(r8 -30) = r1      ; R1_w=0
> ; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key); ;
> strobemeta_probe.bpf.c:400
> 5381: (79) r3 = *(u64 *)(r7 -8)       ; R3_w=scalar()
> R7_w=map_value(map=raw_map_heap,ks=4,vs=264,off=192)
> 5382: (7b) *(u64 *)(r10 -24) = r6     ; ...
> ; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key); ;
> strobemeta_probe.bpf.c:400
> 5383: (bc) w6 = w6                    ; R6_w=scalar(...)
> ; barrier_var(payload_off); ; strobemeta_probe.bpf.c:280
> 5384: (bf) r2 = r6                    ; R2_w=scalar(...)
> 5385: (bf) r1 = r4
>
> VS
>
> ; if (i >= map->cnt) (strobemeta_probe.bpf.c:396)
> 5376: (79) r1 = *(u64 *)(r10 -40)     ;
> R1_w=map_value(map=raw_map_heap,ks=4,vs=264) R10=fp0
> 5377: (79) r1 = *(u64 *)(r1 +8)       ; R1_w=scalar()
> ; if (i >= map->cnt) (strobemeta_probe.bpf.c:396)
> 5378: (dd) if r1 s<= r9 goto pc-5     ; R1_w=scalar(...) R9_w=10
> ; descr->key_lens[i] = 0; (strobemeta_probe.bpf.c:398)
> 5379: (b4) w1 = 0                     ; R1_w=0
> 5380: (6b) *(u16 *)(r8 -30) = r1      ; R1_w=0
> ; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key);
> (strobemeta_probe.bpf.c:400)
> 5381: (79) r3 = *(u64 *)(r7 -8)       ; R3_w=scalar()
> R7_w=map_value(map=raw_map_heap,ks=4,vs=264,off=192)
> 5382: (7b) *(u64 *)(r10 -24) = r6     ; ...
> ; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key);
> (strobemeta_probe.bpf.c:400)
> 5383: (bc) w6 = w6                    ; R6_w=scalar(...)
> ; barrier_var(payload_off); (strobemeta_probe.bpf.c:280)
> 5384: (bf) r2 = r6                    ; R2_w=scalar(...)
> 5385: (bf) r1 = r4
>
>
> Can't say that either is super nice and clean. But when I tried e)
> proposal, I realized that semicolon separators are used also for
> register state (next to instruction dump) and they sort of overlap
> visually more and make it a bit harder to read log (subjective IMO, of
> course).
>
> But let me know if you still prefer e) and I'll send v2 with it.
>

Goodness, gmail made everything even worse. See [0] for visual comparison

  [0] https://gist.github.com/anakryiko/f5e9217f277b0f8cd156ceb6cb641268

> >
> > e) ; if (arg_spec->arg_signed) ; usdt.bpf.h:184
> >
> > Note that that fragile test needs to be adjusted again:
> > Error: #137/3 log_fixup/bad_core_relo_trunc_full
>
> yep, my bad, I forgot to run all test_progs tests this time, already
> fixed locally
>
> >
> > pw-bot: cr