Hitting verifier backtracking bug on 6.5.5 kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andrii

Mohamed ran into what appears to be a verifier bug related to your
commit:

fde2a3882bd0 ("bpf: support precision propagation in the presence of subprogs")

So I figured you'd be the person to ask about this :)

The issue appears on a vanilla 6.5 kernel (on both 6.5.6 on Fedora 38,
and 6.5.5 on my Arch machine):

INFO[0000] Verifier error: load program: bad address:
	1861: frame2: R1_w=fp-160 R2_w=pkt_end(off=0,imm=0) R3=scalar(umin=17,umax=255,var_off=(0x0; 0xff)) R4_w=fp-96 R6_w=fp-96 R7_w=pkt(off=34,r=34,imm=0) R10=fp0
	; switch (protocol) {
	1861: (15) if r3 == 0x11 goto pc+22 1884: frame2: R1_w=fp-160 R2_w=pkt_end(off=0,imm=0) R3=17 R4_w=fp-96 R6_w=fp-96 R7_w=pkt(off=34,r=34,imm=0) R10=fp0
	; if ((void *)udp + sizeof(*udp) <= data_end) {
	1884: (bf) r3 = r7                    ; frame2: R3_w=pkt(off=34,r=34,imm=0) R7_w=pkt(off=34,r=34,imm=0)
	1885: (07) r3 += 8                    ; frame2: R3_w=pkt(off=42,r=34,imm=0)
	; if ((void *)udp + sizeof(*udp) <= data_end) {
	1886: (2d) if r3 > r2 goto pc+23      ; frame2: R2_w=pkt_end(off=0,imm=0) R3_w=pkt(off=42,r=42,imm=0)
	; id->src_port = bpf_ntohs(udp->source);
	1887: (69) r2 = *(u16 *)(r7 +0)       ; frame2: R2_w=scalar(umax=65535,var_off=(0x0; 0xffff)) R7_w=pkt(off=34,r=42,imm=0)
	1888: (bf) r3 = r2                    ; frame2: R2_w=scalar(id=103,umax=65535,var_off=(0x0; 0xffff)) R3_w=scalar(id=103,umax=65535,var_off=(0x0; 0xffff))
	1889: (dc) r3 = be16 r3               ; frame2: R3_w=scalar()
	; id->src_port = bpf_ntohs(udp->source);
	1890: (73) *(u8 *)(r1 +47) = r3       ; frame2: R1_w=fp-160 R3_w=scalar()
	; id->src_port = bpf_ntohs(udp->source);
	1891: (dc) r2 = be64 r2               ; frame2: R2_w=scalar()
	; id->src_port = bpf_ntohs(udp->source);
	1892: (77) r2 >>= 56                  ; frame2: R2_w=scalar(umax=255,var_off=(0x0; 0xff))
	1893: (73) *(u8 *)(r1 +48) = r2
	BUG regs 1
	processed 5121 insns (limit 1000000) max_states_per_insn 4 total_states 92 peak_states 90 mark_read 20
	(truncated)  component=ebpf.FlowFetcher

Dmesg says:

[252431.093126] verifier backtracking bug
[252431.093129] WARNING: CPU: 3 PID: 302245 at kernel/bpf/verifier.c:3533 __mark_chain_precision+0xe83/0x1090


The splat appears when trying to run the netobserv-ebpf-agent. Steps to
reproduce:

git clone https://github.com/netobserv/netobserv-ebpf-agent
cd netobserv-ebpf-agent && make compile
sudo FLOWS_TARGET_HOST=127.0.0.1 FLOWS_TARGET_PORT=9999 ./bin/netobserv-ebpf-agent

(It needs a 'make generate' before the compile to recompile the BPF
program itself, but that requires the Cilium bpf2go program to be
installed and there's a binary version checked into the tree so that is
not strictly necessary to reproduce the splat).

That project uses the Cilium Go eBPF loader. Interestingly, loading the
same program using tc (with libbpf 1.2.2) works just fine:

ip link add type veth
tc qdisc add dev veth0 clsact
tc filter add dev veth0 egress bpf direct-action obj pkg/ebpf/bpf_bpfel.o sec tc_egress

So maybe there is some massaging of the object file that libbpf is doing
but the Go library isn't, that prevents this bug from triggering? I'm
only guessing here, I don't really know exactly what the Go library is
doing under the hood.

Anyway, I guess this is a kernel bug in any case since that WARN() is
there; could you please take a look?

Thanks!

-Toke





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux