On Fri, Oct 13, 2023 at 4:25 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > Hi Andrii > > Mohamed ran into what appears to be a verifier bug related to your > commit: > > fde2a3882bd0 ("bpf: support precision propagation in the presence of subprogs") > > So I figured you'd be the person to ask about this :) > > The issue appears on a vanilla 6.5 kernel (on both 6.5.6 on Fedora 38, > and 6.5.5 on my Arch machine): > > INFO[0000] Verifier error: load program: bad address: > 1861: frame2: R1_w=fp-160 R2_w=pkt_end(off=0,imm=0) R3=scalar(umin=17,umax=255,var_off=(0x0; 0xff)) R4_w=fp-96 R6_w=fp-96 R7_w=pkt(off=34,r=34,imm=0) R10=fp0 > ; switch (protocol) { > 1861: (15) if r3 == 0x11 goto pc+22 1884: frame2: R1_w=fp-160 R2_w=pkt_end(off=0,imm=0) R3=17 R4_w=fp-96 R6_w=fp-96 R7_w=pkt(off=34,r=34,imm=0) R10=fp0 > ; if ((void *)udp + sizeof(*udp) <= data_end) { > 1884: (bf) r3 = r7 ; frame2: R3_w=pkt(off=34,r=34,imm=0) R7_w=pkt(off=34,r=34,imm=0) > 1885: (07) r3 += 8 ; frame2: R3_w=pkt(off=42,r=34,imm=0) > ; if ((void *)udp + sizeof(*udp) <= data_end) { > 1886: (2d) if r3 > r2 goto pc+23 ; frame2: R2_w=pkt_end(off=0,imm=0) R3_w=pkt(off=42,r=42,imm=0) > ; id->src_port = bpf_ntohs(udp->source); > 1887: (69) r2 = *(u16 *)(r7 +0) ; frame2: R2_w=scalar(umax=65535,var_off=(0x0; 0xffff)) R7_w=pkt(off=34,r=42,imm=0) > 1888: (bf) r3 = r2 ; frame2: R2_w=scalar(id=103,umax=65535,var_off=(0x0; 0xffff)) R3_w=scalar(id=103,umax=65535,var_off=(0x0; 0xffff)) > 1889: (dc) r3 = be16 r3 ; frame2: R3_w=scalar() > ; id->src_port = bpf_ntohs(udp->source); > 1890: (73) *(u8 *)(r1 +47) = r3 ; frame2: R1_w=fp-160 R3_w=scalar() > ; id->src_port = bpf_ntohs(udp->source); > 1891: (dc) r2 = be64 r2 ; frame2: R2_w=scalar() > ; id->src_port = bpf_ntohs(udp->source); > 1892: (77) r2 >>= 56 ; frame2: R2_w=scalar(umax=255,var_off=(0x0; 0xff)) > 1893: (73) *(u8 *)(r1 +48) = r2 > BUG regs 1 > processed 5121 insns (limit 1000000) max_states_per_insn 4 total_states 92 peak_states 90 mark_read 20 > (truncated) component=ebpf.FlowFetcher > > Dmesg says: > > [252431.093126] verifier backtracking bug > [252431.093129] WARNING: CPU: 3 PID: 302245 at kernel/bpf/verifier.c:3533 __mark_chain_precision+0xe83/0x1090 > > > The splat appears when trying to run the netobserv-ebpf-agent. Steps to > reproduce: > > git clone https://github.com/netobserv/netobserv-ebpf-agent > cd netobserv-ebpf-agent && make compile > sudo FLOWS_TARGET_HOST=127.0.0.1 FLOWS_TARGET_PORT=9999 ./bin/netobserv-ebpf-agent > > (It needs a 'make generate' before the compile to recompile the BPF > program itself, but that requires the Cilium bpf2go program to be > installed and there's a binary version checked into the tree so that is > not strictly necessary to reproduce the splat). > > That project uses the Cilium Go eBPF loader. Interestingly, loading the > same program using tc (with libbpf 1.2.2) works just fine: > > ip link add type veth > tc qdisc add dev veth0 clsact > tc filter add dev veth0 egress bpf direct-action obj pkg/ebpf/bpf_bpfel.o sec tc_egress > > So maybe there is some massaging of the object file that libbpf is doing > but the Go library isn't, that prevents this bug from triggering? I'm > only guessing here, I don't really know exactly what the Go library is > doing under the hood. > Interesting, have you tried https://github.com/cilium/ebpf/pull/1159 ? > Anyway, I guess this is a kernel bug in any case since that WARN() is > there; could you please take a look? > > Thanks! > > -Toke > >