On Tue, Oct 17, 2023 at 08:16:55AM -0400, Mohamed Mahmoud wrote: > Any idea why the same verification errors are not seen when the > program is attached with bpftool ? Not sure why, but I've captured the verifier log during the successful tc load here (using a slightly modified tc) on v6.5.4: https://gist.github.com/shunghsiyu/b3bd6e4f4e1510e98a80491d50f3908b 1890: (dc) r2 = be64 r2 ; frame2: R2_w=scalar() ; id->src_port = bpf_ntohs(udp->source); 1891: (77) r2 >>= 56 ; frame2: R2_w=scalar(umax=255,var_off=(0x0; 0xff)) 1892: (73) *(u8 *)(r1 +48) = r2 mark_precise: frame2: last_idx 1892 first_idx 1883 subseq_idx -1 mark_precise: frame2: regs=r2 stack= before 1891: (77) r2 >>= 56 mark_precise: frame2: regs=r2 stack= before 1890: (dc) r2 = be64 r2 mark_precise: frame2: regs=r0,r2 stack= before 1889: (73) *(u8 *)(r1 +47) = r3 mark_precise: frame2: regs=r0,r2 stack= before 1888: (dc) r3 = be16 r3 mark_precise: frame2: regs=r0,r2 stack= before 1887: (bf) r3 = r2 mark_precise: frame2: regs=r0,r2 stack= before 1886: (69) r2 = *(u16 *)(r7 +0) mark_precise: frame2: regs=r0 stack= before 1885: (2d) if r3 > r2 goto pc+23 mark_precise: frame2: regs=r0 stack= before 1884: (07) r3 += 8 mark_precise: frame2: regs=r0 stack= before 1883: (bf) r3 = r7 mark_precise: frame2: parent state regs= stack=: frame2: R1_r=fp-160 R2_r=pkt_end(off=0,imm=0) R3=17 R4=fp-96 R6=fp-96 R7_r=pkt(off=54,r=54,imm=0) R10=fp0 mark_precise: frame1: parent state regs= stack=: frame1: R6=ctx(off=0,imm=0) R7=1 R8=pkt_end(off=0,imm=0) R10=fp0 fp-56= fp-64=00000000 fp-72=00000000 fp-80=00000000 fp-88=mmmmmmmm fp-96=fp fp-104=??????00 fp-112=0000m000 fp-120= fp-128=mmmmmmmm fp-136=mmmmmmmm fp-144= fp-152=mmmmmmmm fp-160=mmmmm0mm mark_precise: frame0: parent state regs= stack=: R10=fp0 ; id->dst_port = bpf_ntohs(udp->dest); 1893: (69) r2 = *(u16 *)(r7 +2) ; frame2: R2_w=scalar(umax=65535,var_off=(0x0; 0xffff)) R7=pkt(off=54,r=62,imm=0) Looks like r0 is also being incorrectly added the to the precise regs set here; but I'm not sure why backtracking didn't go all the way back to "call pc+1617" (which trigger the warning). > On Tue, Oct 17, 2023 at 7:08 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > > > Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes: > > > > > On Mon, Oct 16, 2023 at 12:37 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > >> > > >> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes: > > >> > > >> > On Thu, Oct 12, 2023 at 1:25 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > >> >> > > >> >> Hi Andrii > > >> >> > > >> >> Mohamed ran into what appears to be a verifier bug related to your > > >> >> commit: > > >> >> > > >> >> fde2a3882bd0 ("bpf: support precision propagation in the presence of subprogs") > > >> >> > > >> >> So I figured you'd be the person to ask about this :) > > >> >> > > >> >> The issue appears on a vanilla 6.5 kernel (on both 6.5.6 on Fedora 38, > > >> >> and 6.5.5 on my Arch machine): > > >> >> > > >> >> INFO[0000] Verifier error: load program: bad address: > > >> >> 1861: frame2: R1_w=fp-160 R2_w=pkt_end(off=0,imm=0) R3=scalar(umin=17,umax=255,var_off=(0x0; 0xff)) R4_w=fp-96 R6_w=fp-96 R7_w=pkt(off=34,r=34,imm=0) R10=fp0 > > >> >> ; switch (protocol) { > > >> >> 1861: (15) if r3 == 0x11 goto pc+22 1884: frame2: R1_w=fp-160 R2_w=pkt_end(off=0,imm=0) R3=17 R4_w=fp-96 R6_w=fp-96 R7_w=pkt(off=34,r=34,imm=0) R10=fp0 > > >> >> ; if ((void *)udp + sizeof(*udp) <= data_end) { > > >> >> 1884: (bf) r3 = r7 ; frame2: R3_w=pkt(off=34,r=34,imm=0) R7_w=pkt(off=34,r=34,imm=0) > > >> >> 1885: (07) r3 += 8 ; frame2: R3_w=pkt(off=42,r=34,imm=0) > > >> >> ; if ((void *)udp + sizeof(*udp) <= data_end) { > > >> >> 1886: (2d) if r3 > r2 goto pc+23 ; frame2: R2_w=pkt_end(off=0,imm=0) R3_w=pkt(off=42,r=42,imm=0) > > >> >> ; id->src_port = bpf_ntohs(udp->source); > > >> >> 1887: (69) r2 = *(u16 *)(r7 +0) ; frame2: R2_w=scalar(umax=65535,var_off=(0x0; 0xffff)) R7_w=pkt(off=34,r=42,imm=0) > > >> >> 1888: (bf) r3 = r2 ; frame2: R2_w=scalar(id=103,umax=65535,var_off=(0x0; 0xffff)) R3_w=scalar(id=103,umax=65535,var_off=(0x0; 0xffff)) > > >> >> 1889: (dc) r3 = be16 r3 ; frame2: R3_w=scalar() > > >> >> ; id->src_port = bpf_ntohs(udp->source); > > >> >> 1890: (73) *(u8 *)(r1 +47) = r3 ; frame2: R1_w=fp-160 R3_w=scalar() > > >> >> ; id->src_port = bpf_ntohs(udp->source); > > >> >> 1891: (dc) r2 = be64 r2 ; frame2: R2_w=scalar() > > >> >> ; id->src_port = bpf_ntohs(udp->source); > > >> >> 1892: (77) r2 >>= 56 ; frame2: R2_w=scalar(umax=255,var_off=(0x0; 0xff)) > > >> >> 1893: (73) *(u8 *)(r1 +48) = r2 > > >> >> BUG regs 1 > > >> >> processed 5121 insns (limit 1000000) max_states_per_insn 4 total_states 92 peak_states 90 mark_read 20 > > >> >> (truncated) component=ebpf.FlowFetcher > > >> >> > > >> >> Dmesg says: > > >> >> > > >> >> [252431.093126] verifier backtracking bug > > >> >> [252431.093129] WARNING: CPU: 3 PID: 302245 at kernel/bpf/verifier.c:3533 __mark_chain_precision+0xe83/0x1090 > > >> >> > > >> >> > > >> >> The splat appears when trying to run the netobserv-ebpf-agent. Steps to > > >> >> reproduce: > > >> >> > > >> >> git clone https://github.com/netobserv/netobserv-ebpf-agent > > >> >> cd netobserv-ebpf-agent && make compile > > >> >> sudo FLOWS_TARGET_HOST=127.0.0.1 FLOWS_TARGET_PORT=9999 ./bin/netobserv-ebpf-agent > > >> >> > > >> >> (It needs a 'make generate' before the compile to recompile the BPF > > >> >> program itself, but that requires the Cilium bpf2go program to be > > >> >> installed and there's a binary version checked into the tree so that is > > >> >> not strictly necessary to reproduce the splat). > > >> >> > > >> >> That project uses the Cilium Go eBPF loader. Interestingly, loading the > > >> >> same program using tc (with libbpf 1.2.2) works just fine: > > >> >> > > >> >> ip link add type veth > > >> >> tc qdisc add dev veth0 clsact > > >> >> tc filter add dev veth0 egress bpf direct-action obj pkg/ebpf/bpf_bpfel.o sec tc_egress > > >> >> > > >> >> So maybe there is some massaging of the object file that libbpf is doing > > >> >> but the Go library isn't, that prevents this bug from triggering? I'm > > >> >> only guessing here, I don't really know exactly what the Go library is > > >> >> doing under the hood. > > >> >> > > >> >> Anyway, I guess this is a kernel bug in any case since that WARN() is > > >> >> there; could you please take a look? > > >> >> > > >> > > > >> > Yes, I tried. Unfortunately I can't build netobserv-ebpf-agent on my > > >> > dev machine and can't run it. I tried to load bpf_bpfel.o through > > >> > veristat, but unfortunately it is not libbpf-compatible. > > >> > > > >> > Is there some way to get a full verifier log for the failure above? > > >> > with log_level 2, if possible? If you can share it through Github Gist > > >> > or something like that, I'd really appreciate it. Thanks! > > >> > > >> Sure, here you go: > > >> https://gist.github.com/tohojo/31173d2bb07262a21393f76d9a45132d > > > > > > Thanks, this is very useful. And it's pretty clear what happens from > > > last few lines: > > > > > > mark_precise: frame2: regs=r2 stack= before 1890: (dc) r2 = be64 r2 > > > mark_precise: frame2: regs=r0,r2 stack= before 1889: (73) *(u8 > > > *)(r1 +47) = r3 > > > > > > See how we add r0 to the regs set, while there is no r0 involved in > > > `r2 = be64 r2`? I think it's just a missing case of handling BPF_END > > > (and perhaps BPF_NEG as well) instructions in backtrack_insn(). Should > > > be a trivial fix, though ideally we should also add some test for this > > > as well. > > > > Sounds good, thank you for looking into it! Let me know if you need me > > to test a patch :) > > > > -Toke > > >