Re: Hitting verifier backtracking bug on 6.5.5 kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes:

> On Thu, Oct 12, 2023 at 1:25 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>>
>> Hi Andrii
>>
>> Mohamed ran into what appears to be a verifier bug related to your
>> commit:
>>
>> fde2a3882bd0 ("bpf: support precision propagation in the presence of subprogs")
>>
>> So I figured you'd be the person to ask about this :)
>>
>> The issue appears on a vanilla 6.5 kernel (on both 6.5.6 on Fedora 38,
>> and 6.5.5 on my Arch machine):
>>
>> INFO[0000] Verifier error: load program: bad address:
>>         1861: frame2: R1_w=fp-160 R2_w=pkt_end(off=0,imm=0) R3=scalar(umin=17,umax=255,var_off=(0x0; 0xff)) R4_w=fp-96 R6_w=fp-96 R7_w=pkt(off=34,r=34,imm=0) R10=fp0
>>         ; switch (protocol) {
>>         1861: (15) if r3 == 0x11 goto pc+22 1884: frame2: R1_w=fp-160 R2_w=pkt_end(off=0,imm=0) R3=17 R4_w=fp-96 R6_w=fp-96 R7_w=pkt(off=34,r=34,imm=0) R10=fp0
>>         ; if ((void *)udp + sizeof(*udp) <= data_end) {
>>         1884: (bf) r3 = r7                    ; frame2: R3_w=pkt(off=34,r=34,imm=0) R7_w=pkt(off=34,r=34,imm=0)
>>         1885: (07) r3 += 8                    ; frame2: R3_w=pkt(off=42,r=34,imm=0)
>>         ; if ((void *)udp + sizeof(*udp) <= data_end) {
>>         1886: (2d) if r3 > r2 goto pc+23      ; frame2: R2_w=pkt_end(off=0,imm=0) R3_w=pkt(off=42,r=42,imm=0)
>>         ; id->src_port = bpf_ntohs(udp->source);
>>         1887: (69) r2 = *(u16 *)(r7 +0)       ; frame2: R2_w=scalar(umax=65535,var_off=(0x0; 0xffff)) R7_w=pkt(off=34,r=42,imm=0)
>>         1888: (bf) r3 = r2                    ; frame2: R2_w=scalar(id=103,umax=65535,var_off=(0x0; 0xffff)) R3_w=scalar(id=103,umax=65535,var_off=(0x0; 0xffff))
>>         1889: (dc) r3 = be16 r3               ; frame2: R3_w=scalar()
>>         ; id->src_port = bpf_ntohs(udp->source);
>>         1890: (73) *(u8 *)(r1 +47) = r3       ; frame2: R1_w=fp-160 R3_w=scalar()
>>         ; id->src_port = bpf_ntohs(udp->source);
>>         1891: (dc) r2 = be64 r2               ; frame2: R2_w=scalar()
>>         ; id->src_port = bpf_ntohs(udp->source);
>>         1892: (77) r2 >>= 56                  ; frame2: R2_w=scalar(umax=255,var_off=(0x0; 0xff))
>>         1893: (73) *(u8 *)(r1 +48) = r2
>>         BUG regs 1
>>         processed 5121 insns (limit 1000000) max_states_per_insn 4 total_states 92 peak_states 90 mark_read 20
>>         (truncated)  component=ebpf.FlowFetcher
>>
>> Dmesg says:
>>
>> [252431.093126] verifier backtracking bug
>> [252431.093129] WARNING: CPU: 3 PID: 302245 at kernel/bpf/verifier.c:3533 __mark_chain_precision+0xe83/0x1090
>>
>>
>> The splat appears when trying to run the netobserv-ebpf-agent. Steps to
>> reproduce:
>>
>> git clone https://github.com/netobserv/netobserv-ebpf-agent
>> cd netobserv-ebpf-agent && make compile
>> sudo FLOWS_TARGET_HOST=127.0.0.1 FLOWS_TARGET_PORT=9999 ./bin/netobserv-ebpf-agent
>>
>> (It needs a 'make generate' before the compile to recompile the BPF
>> program itself, but that requires the Cilium bpf2go program to be
>> installed and there's a binary version checked into the tree so that is
>> not strictly necessary to reproduce the splat).
>>
>> That project uses the Cilium Go eBPF loader. Interestingly, loading the
>> same program using tc (with libbpf 1.2.2) works just fine:
>>
>> ip link add type veth
>> tc qdisc add dev veth0 clsact
>> tc filter add dev veth0 egress bpf direct-action obj pkg/ebpf/bpf_bpfel.o sec tc_egress
>>
>> So maybe there is some massaging of the object file that libbpf is doing
>> but the Go library isn't, that prevents this bug from triggering? I'm
>> only guessing here, I don't really know exactly what the Go library is
>> doing under the hood.
>>
>> Anyway, I guess this is a kernel bug in any case since that WARN() is
>> there; could you please take a look?
>>
>
> Yes, I tried. Unfortunately I can't build netobserv-ebpf-agent on my
> dev machine and can't run it. I tried to load bpf_bpfel.o through
> veristat, but unfortunately it is not libbpf-compatible.
>
> Is there some way to get a full verifier log for the failure above?
> with log_level 2, if possible? If you can share it through Github Gist
> or something like that, I'd really appreciate it. Thanks!

Sure, here you go:
https://gist.github.com/tohojo/31173d2bb07262a21393f76d9a45132d

-Toke






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux