On Mon, Feb 12, 2024 at 10:59 AM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Sun, Feb 11, 2024 at 11:43 AM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Fri, Feb 9, 2024 at 4:33 PM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote: > > > > > > As BPF applications grow in size and complexity and are separated into > > > multiple .bpf.c files that are statically linked together, it becomes > > > harder and harder to match verifier's BPF assembly level output to > > > original C code. While often annotated C source code is unique enough to > > > be able to identify the file it belongs to, quite often this is actually > > > problematic as parts of source code can be quite generic. > > > > > > Long story short, it is very useful to see source code file name and > > > line number information along with the original C code. Verifier already > > > knows this information, we just need to output it. > > > > > > This patch set is an initial proposal on how this can be done. No new > > > flags are added and file:line information is appended at the end of > > > C code: > > > > > > ; <original C code> (<filename>.bpf.c:<line>) > > > > > > If file name has directory names in it, they are stripped away. This > > > should be fine in practice as file names tend to be pretty unique with > > > C code anyways, and keeping log size smaller is always good. > > > > > > In practice this might look something like below, where some code is > > > coming from application files, while others are from libbpf's usdt.bpf.h > > > header file: > > > > > > ; if (STROBEMETA_READ( (strobemeta_probe.bpf.c:534) > > > 5592: (79) r1 = *(u64 *)(r10 -56) ; R1_w=mem_or_null(id=1589,sz=7680) R10=fp0 fp-56_w=mem_or_null(id=1589,sz=7680) > > > 5593: (7b) *(u64 *)(r10 -56) = r1 ; R1_w=mem_or_null(id=1589,sz=7680) R10=fp0 fp-56_w=mem_or_null(id=1589,sz=7680) > > > 5594: (79) r3 = *(u64 *)(r10 -8) ; R3_w=scalar() R10=fp0 fp-8=mmmmmmmm > > > > > > ... > > > > > > 170: (71) r1 = *(u8 *)(r8 +15) ; frame1: R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=255,var_off=(0x0; 0xff)) R8_w=map_value(map=__bpf_usdt_spec,ks=4,vs=208) > > > 171: (67) r1 <<= 56 ; frame1: R1_w=scalar(smax=0x7f00000000000000,umax=0xff00000000000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xff00000000000000)) > > > 172: (c7) r1 s>>= 56 ; frame1: R1_w=scalar(smin=smin32=-128,smax=smax32=127) > > > ; val <<= arg_spec->arg_bitshift; (usdt.bpf.h:183) > > > 173: (67) r1 <<= 32 ; frame1: R1_w=scalar(smax=0x7f00000000,umax=0xffffffff00000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xffffffff00000000)) > > > 174: (77) r1 >>= 32 ; frame1: R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) > > > 175: (79) r2 = *(u64 *)(r10 -8) ; frame1: R2_w=scalar() R10=fp0 fp-8=mmmmmmmm > > > 176: (6f) r2 <<= r1 ; frame1: R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R2_w=scalar() > > > 177: (7b) *(u64 *)(r10 -8) = r2 ; frame1: R2_w=scalar(id=61) R10=fp0 fp-8_w=scalar(id=61) > > > ; if (arg_spec->arg_signed) (usdt.bpf.h:184) > > > 178: (bf) r3 = r2 ; frame1: R2_w=scalar(id=61) R3_w=scalar(id=61) > > > 179: (7f) r3 >>= r1 ; frame1: R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R3_w=scalar() > > > ; if (arg_spec->arg_signed) (usdt.bpf.h:184) > > > 180: (71) r4 = *(u8 *)(r8 +14) > > > 181: safe > > > > > > I've played with few different formats and none stood out as > > > particularly better than other. Suggestions and votes are appreciated: > > > > > > a) ; if (arg_spec->arg_signed) (usdt.bpf.h:184) > > > b) ; if (arg_spec->arg_signed) [usdt.bpf.h:184] > > > c) ; [usdt.bpf.h:184] if (arg_spec->arg_signed) > > > d) ; (usdt.bpf.h:184) if (arg_spec->arg_signed) > > > > Great idea. > > I would drop parenthesis. Don't see a reason to wrap a text. > > Since we already use ';' as a comment I would continue: > > A bit worried that it will quite weird for any source line which ends > in ';', like: > > ; x = 123; ; my_file.bpf.c:123 > r1 = 123; > > > E.g., here's a real excerpt (I cut some register states for cleanliness): > > ; if (i >= map->cnt) ; strobemeta_probe.bpf.c:396 > 5376: (79) r1 = *(u64 *)(r10 -40) ; > R1_w=map_value(map=raw_map_heap,ks=4,vs=264) R10=fp0 > 5377: (79) r1 = *(u64 *)(r1 +8) ; R1_w=scalar(...) R9_w=10 > ; if (i >= map->cnt) ; strobemeta_probe.bpf.c:396 > 5378: (dd) if r1 s<= r9 goto pc-5 ; R1_w=scalar(smin=umin=11,...) > ; descr->key_lens[i] = 0; ; strobemeta_probe.bpf.c:398 > 5379: (b4) w1 = 0 ; R1_w=0 > 5380: (6b) *(u16 *)(r8 -30) = r1 ; R1_w=0 > ; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key); ; > strobemeta_probe.bpf.c:400 > 5381: (79) r3 = *(u64 *)(r7 -8) ; R3_w=scalar() > R7_w=map_value(map=raw_map_heap,ks=4,vs=264,off=192) > 5382: (7b) *(u64 *)(r10 -24) = r6 ; ... > ; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key); ; > strobemeta_probe.bpf.c:400 > 5383: (bc) w6 = w6 ; R6_w=scalar(...) > ; barrier_var(payload_off); ; strobemeta_probe.bpf.c:280 > 5384: (bf) r2 = r6 ; R2_w=scalar(...) > 5385: (bf) r1 = r4 > > VS > > ; if (i >= map->cnt) (strobemeta_probe.bpf.c:396) > 5376: (79) r1 = *(u64 *)(r10 -40) ; > R1_w=map_value(map=raw_map_heap,ks=4,vs=264) R10=fp0 > 5377: (79) r1 = *(u64 *)(r1 +8) ; R1_w=scalar() > ; if (i >= map->cnt) (strobemeta_probe.bpf.c:396) > 5378: (dd) if r1 s<= r9 goto pc-5 ; R1_w=scalar(...) R9_w=10 > ; descr->key_lens[i] = 0; (strobemeta_probe.bpf.c:398) > 5379: (b4) w1 = 0 ; R1_w=0 > 5380: (6b) *(u16 *)(r8 -30) = r1 ; R1_w=0 > ; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key); > (strobemeta_probe.bpf.c:400) > 5381: (79) r3 = *(u64 *)(r7 -8) ; R3_w=scalar() > R7_w=map_value(map=raw_map_heap,ks=4,vs=264,off=192) > 5382: (7b) *(u64 *)(r10 -24) = r6 ; ... > ; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key); > (strobemeta_probe.bpf.c:400) > 5383: (bc) w6 = w6 ; R6_w=scalar(...) > ; barrier_var(payload_off); (strobemeta_probe.bpf.c:280) > 5384: (bf) r2 = r6 ; R2_w=scalar(...) > 5385: (bf) r1 = r4 > > > Can't say that either is super nice and clean. But when I tried e) > proposal, I realized that semicolon separators are used also for > register state (next to instruction dump) and they sort of overlap > visually more and make it a bit harder to read log (subjective IMO, of > course). > > But let me know if you still prefer e) and I'll send v2 with it. > Goodness, gmail made everything even worse. See [0] for visual comparison [0] https://gist.github.com/anakryiko/f5e9217f277b0f8cd156ceb6cb641268 > > > > e) ; if (arg_spec->arg_signed) ; usdt.bpf.h:184 > > > > Note that that fragile test needs to be adjusted again: > > Error: #137/3 log_fixup/bad_core_relo_trunc_full > > yep, my bad, I forgot to run all test_progs tests this time, already > fixed locally > > > > > pw-bot: cr