Re: [PATCH bpf-next] bpf: emit source code file name and line number in verifier log

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 12 Feb 2024 10:59:46 -0800

On Sun, Feb 11, 2024 at 11:43 AM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Fri, Feb 9, 2024 at 4:33 PM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote:
> >
> > As BPF applications grow in size and complexity and are separated into
> > multiple .bpf.c files that are statically linked together, it becomes
> > harder and harder to match verifier's BPF assembly level output to
> > original C code. While often annotated C source code is unique enough to
> > be able to identify the file it belongs to, quite often this is actually
> > problematic as parts of source code can be quite generic.
> >
> > Long story short, it is very useful to see source code file name and
> > line number information along with the original C code. Verifier already
> > knows this information, we just need to output it.
> >
> > This patch set is an initial proposal on how this can be done. No new
> > flags are added and file:line information is appended at the end of
> > C code:
> >
> >   ; <original C code> (<filename>.bpf.c:<line>)
> >
> > If file name has directory names in it, they are stripped away. This
> > should be fine in practice as file names tend to be pretty unique with
> > C code anyways, and keeping log size smaller is always good.
> >
> > In practice this might look something like below, where some code is
> > coming from application files, while others are from libbpf's usdt.bpf.h
> > header file:
> >
> >   ; if (STROBEMETA_READ( (strobemeta_probe.bpf.c:534)
> >   5592: (79) r1 = *(u64 *)(r10 -56)     ; R1_w=mem_or_null(id=1589,sz=7680) R10=fp0 fp-56_w=mem_or_null(id=1589,sz=7680)
> >   5593: (7b) *(u64 *)(r10 -56) = r1     ; R1_w=mem_or_null(id=1589,sz=7680) R10=fp0 fp-56_w=mem_or_null(id=1589,sz=7680)
> >   5594: (79) r3 = *(u64 *)(r10 -8)      ; R3_w=scalar() R10=fp0 fp-8=mmmmmmmm
> >
> >   ...
> >
> >   170: (71) r1 = *(u8 *)(r8 +15)        ; frame1: R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) R8_w=map_value(map=__bpf_usdt_spec,ks=4,vs=208)
> >   171: (67) r1 <<= 56                   ; frame1: R1_w=scalar(smax=0x7f00000000000000,umax=0xff00000000000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xff00000000000000))
> >   172: (c7) r1 s>>= 56                  ; frame1: R1_w=scalar(smin=smin32=-128,smax=smax32=127)
> >   ; val <<= arg_spec->arg_bitshift; (usdt.bpf.h:183)
> >   173: (67) r1 <<= 32                   ; frame1: R1_w=scalar(smax=0x7f00000000,umax=0xffffffff00000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xffffffff00000000))
> >   174: (77) r1 >>= 32                   ; frame1: R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
> >   175: (79) r2 = *(u64 *)(r10 -8)       ; frame1: R2_w=scalar() R10=fp0 fp-8=mmmmmmmm
> >   176: (6f) r2 <<= r1                   ; frame1: R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R2_w=scalar()
> >   177: (7b) *(u64 *)(r10 -8) = r2       ; frame1: R2_w=scalar(id=61) R10=fp0 fp-8_w=scalar(id=61)
> >   ; if (arg_spec->arg_signed) (usdt.bpf.h:184)
> >   178: (bf) r3 = r2                     ; frame1: R2_w=scalar(id=61) R3_w=scalar(id=61)
> >   179: (7f) r3 >>= r1                   ; frame1: R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R3_w=scalar()
> >   ; if (arg_spec->arg_signed) (usdt.bpf.h:184)
> >   180: (71) r4 = *(u8 *)(r8 +14)
> >   181: safe
> >
> > I've played with few different formats and none stood out as
> > particularly better than other. Suggestions and votes are appreciated:
> >
> >   a) ; if (arg_spec->arg_signed) (usdt.bpf.h:184)
> >   b) ; if (arg_spec->arg_signed) [usdt.bpf.h:184]
> >   c) ; [usdt.bpf.h:184] if (arg_spec->arg_signed)
> >   d) ; (usdt.bpf.h:184) if (arg_spec->arg_signed)
>
> Great idea.
> I would drop parenthesis. Don't see a reason to wrap a text.
> Since we already use ';' as a comment I would continue:

A bit worried that it will quite weird for any source line which ends
in ';', like:

; x = 123; ; my_file.bpf.c:123
r1 = 123;

E.g., here's a real excerpt (I cut some register states for cleanliness):

; if (i >= map->cnt) ; strobemeta_probe.bpf.c:396
5376: (79) r1 = *(u64 *)(r10 -40)     ;
R1_w=map_value(map=raw_map_heap,ks=4,vs=264) R10=fp0
5377: (79) r1 = *(u64 *)(r1 +8)       ; R1_w=scalar(...) R9_w=10
; if (i >= map->cnt) ; strobemeta_probe.bpf.c:396
5378: (dd) if r1 s<= r9 goto pc-5     ; R1_w=scalar(smin=umin=11,...)
; descr->key_lens[i] = 0; ; strobemeta_probe.bpf.c:398
5379: (b4) w1 = 0                     ; R1_w=0
5380: (6b) *(u16 *)(r8 -30) = r1      ; R1_w=0
; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key); ;
strobemeta_probe.bpf.c:400
5381: (79) r3 = *(u64 *)(r7 -8)       ; R3_w=scalar()
R7_w=map_value(map=raw_map_heap,ks=4,vs=264,off=192)
5382: (7b) *(u64 *)(r10 -24) = r6     ; ...
; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key); ;
strobemeta_probe.bpf.c:400
5383: (bc) w6 = w6                    ; R6_w=scalar(...)
; barrier_var(payload_off); ; strobemeta_probe.bpf.c:280
5384: (bf) r2 = r6                    ; R2_w=scalar(...)
5385: (bf) r1 = r4

VS

; if (i >= map->cnt) (strobemeta_probe.bpf.c:396)
5376: (79) r1 = *(u64 *)(r10 -40)     ;
R1_w=map_value(map=raw_map_heap,ks=4,vs=264) R10=fp0
5377: (79) r1 = *(u64 *)(r1 +8)       ; R1_w=scalar()
; if (i >= map->cnt) (strobemeta_probe.bpf.c:396)
5378: (dd) if r1 s<= r9 goto pc-5     ; R1_w=scalar(...) R9_w=10
; descr->key_lens[i] = 0; (strobemeta_probe.bpf.c:398)
5379: (b4) w1 = 0                     ; R1_w=0
5380: (6b) *(u16 *)(r8 -30) = r1      ; R1_w=0
; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key);
(strobemeta_probe.bpf.c:400)
5381: (79) r3 = *(u64 *)(r7 -8)       ; R3_w=scalar()
R7_w=map_value(map=raw_map_heap,ks=4,vs=264,off=192)
5382: (7b) *(u64 *)(r10 -24) = r6     ; ...
; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key);
(strobemeta_probe.bpf.c:400)
5383: (bc) w6 = w6                    ; R6_w=scalar(...)
; barrier_var(payload_off); (strobemeta_probe.bpf.c:280)
5384: (bf) r2 = r6                    ; R2_w=scalar(...)
5385: (bf) r1 = r4

Can't say that either is super nice and clean. But when I tried e)
proposal, I realized that semicolon separators are used also for
register state (next to instruction dump) and they sort of overlap
visually more and make it a bit harder to read log (subjective IMO, of
course).

But let me know if you still prefer e) and I'll send v2 with it.

>
> e) ; if (arg_spec->arg_signed) ; usdt.bpf.h:184
>
> Note that that fragile test needs to be adjusted again:
> Error: #137/3 log_fixup/bad_core_relo_trunc_full

yep, my bad, I forgot to run all test_progs tests this time, already
fixed locally

>
> pw-bot: cr