Shung-Hsi Yu <shung-hsi.yu@xxxxxxxx> writes: > On Tue, Oct 17, 2023 at 07:24:40PM +0200, Toke Høiland-Jørgensen wrote: >> Shung-Hsi Yu <shung-hsi.yu@xxxxxxxx> writes: >> > On Tue, Oct 17, 2023 at 01:08:25PM +0200, Toke Høiland-Jørgensen wrote: >> >> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes: >> >> > On Mon, Oct 16, 2023 at 12:37 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: >> >> >> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes: >> >> >> > On Thu, Oct 12, 2023 at 1:25 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: >> >> >> >> >> >> >> >> Hi Andrii >> >> >> >> >> >> >> >> Mohamed ran into what appears to be a verifier bug related to your >> >> >> >> commit: >> >> >> >> >> >> >> >> fde2a3882bd0 ("bpf: support precision propagation in the presence of subprogs") >> >> >> >> >> >> >> >> So I figured you'd be the person to ask about this :) >> >> >> >> >> >> >> >> The issue appears on a vanilla 6.5 kernel (on both 6.5.6 on Fedora 38, >> >> >> >> and 6.5.5 on my Arch machine): >> >> >> >> >> >> >> >> INFO[0000] Verifier error: load program: bad address: >> >> >> >> 1861: frame2: R1_w=fp-160 R2_w=pkt_end(off=0,imm=0) R3=scalar(umin=17,umax=255,var_off=(0x0; 0xff)) R4_w=fp-96 R6_w=fp-96 R7_w=pkt(off=34,r=34,imm=0) R10=fp0 >> >> >> >> ; switch (protocol) { >> >> >> >> 1861: (15) if r3 == 0x11 goto pc+22 1884: frame2: R1_w=fp-160 R2_w=pkt_end(off=0,imm=0) R3=17 R4_w=fp-96 R6_w=fp-96 R7_w=pkt(off=34,r=34,imm=0) R10=fp0 >> >> >> >> ; if ((void *)udp + sizeof(*udp) <= data_end) { >> >> >> >> 1884: (bf) r3 = r7 ; frame2: R3_w=pkt(off=34,r=34,imm=0) R7_w=pkt(off=34,r=34,imm=0) >> >> >> >> 1885: (07) r3 += 8 ; frame2: R3_w=pkt(off=42,r=34,imm=0) >> >> >> >> ; if ((void *)udp + sizeof(*udp) <= data_end) { >> >> >> >> 1886: (2d) if r3 > r2 goto pc+23 ; frame2: R2_w=pkt_end(off=0,imm=0) R3_w=pkt(off=42,r=42,imm=0) >> >> >> >> ; id->src_port = bpf_ntohs(udp->source); >> >> >> >> 1887: (69) r2 = *(u16 *)(r7 +0) ; frame2: R2_w=scalar(umax=65535,var_off=(0x0; 0xffff)) R7_w=pkt(off=34,r=42,imm=0) >> >> >> >> 1888: (bf) r3 = r2 ; frame2: R2_w=scalar(id=103,umax=65535,var_off=(0x0; 0xffff)) R3_w=scalar(id=103,umax=65535,var_off=(0x0; 0xffff)) >> >> >> >> 1889: (dc) r3 = be16 r3 ; frame2: R3_w=scalar() >> >> >> >> ; id->src_port = bpf_ntohs(udp->source); >> >> >> >> 1890: (73) *(u8 *)(r1 +47) = r3 ; frame2: R1_w=fp-160 R3_w=scalar() >> >> >> >> ; id->src_port = bpf_ntohs(udp->source); >> >> >> >> 1891: (dc) r2 = be64 r2 ; frame2: R2_w=scalar() >> >> >> >> ; id->src_port = bpf_ntohs(udp->source); >> >> >> >> 1892: (77) r2 >>= 56 ; frame2: R2_w=scalar(umax=255,var_off=(0x0; 0xff)) >> >> >> >> 1893: (73) *(u8 *)(r1 +48) = r2 >> >> >> >> BUG regs 1 >> >> >> >> processed 5121 insns (limit 1000000) max_states_per_insn 4 total_states 92 peak_states 90 mark_read 20 >> >> >> >> (truncated) component=ebpf.FlowFetcher >> >> >> >> >> >> >> >> Dmesg says: >> >> >> >> >> >> >> >> [252431.093126] verifier backtracking bug >> >> >> >> [252431.093129] WARNING: CPU: 3 PID: 302245 at kernel/bpf/verifier.c:3533 __mark_chain_precision+0xe83/0x1090 >> >> >> >> >> >> >> >> ... >> >> >> > >> >> >> > Is there some way to get a full verifier log for the failure above? >> >> >> > with log_level 2, if possible? If you can share it through Github Gist >> >> >> > or something like that, I'd really appreciate it. Thanks! >> >> >> >> >> >> Sure, here you go: >> >> >> https://gist.github.com/tohojo/31173d2bb07262a21393f76d9a45132d >> >> > >> >> > Thanks, this is very useful. And it's pretty clear what happens from >> >> > last few lines: >> >> > >> >> > mark_precise: frame2: regs=r2 stack= before 1890: (dc) r2 = be64 r2 >> >> > mark_precise: frame2: regs=r0,r2 stack= before 1889: (73) *(u8 >> >> > *)(r1 +47) = r3 >> >> > >> >> > See how we add r0 to the regs set, while there is no r0 involved in >> >> > `r2 = be64 r2`? I think it's just a missing case of handling BPF_END >> >> > (and perhaps BPF_NEG as well) instructions in backtrack_insn(). Should >> >> > be a trivial fix, though ideally we should also add some test for this >> >> > as well. > > Turns out the only case r0 is wrongly added to the regs set is with > BPF_ALU | BPF_TO_BE | BPF_END like the one seen here (only realize this > while working on selftests). All other cases are already handled correctly > because they happens to fall into the BPF_SRC(insn->code) == BPF_K == 0 case. > > } else { > if (BPF_SRC(insn->code) == BPF_X) { > bt_set_reg(bt, sreg); > } > /* BPF_NEG, BPF_ALU | BPF_TO_LE | BPF_END, and > * BPF_ALU64 | BPF_END goes here in backtrack_insn() > */ > } > > That said, having a "if (opcode == BPF_END || opcode == BPF_NEG)" check > still makes more sense, so I'm sticking with that. > > RFC can be found at > https://lore.kernel.org/bpf/20231030132145.20867-1-shung-hsi.yu@xxxxxxxx/ Great, thanks for taking care of this! :) -Toke