Hi, Sorry for coming to this late. Replies are in-line/interleaved, so some of my comments might be hidden by email client. On Mon, Sep 23, 2024 at 07:26:25PM GMT, Eduard Zingerman wrote: > On Mon, 2024-09-23 at 19:35 +0100, Alasdair McWilliam wrote: > > Hello, > > > > First post so please be gentle :-) > > > > I've got an eBPF workload running on kernel 6.1 LTS and we're running great. > > > > Use case actually is using eBPF in combination with XDP and AF_XDP for > > volumetric DDoS mitigation. > > > > Makeup of the eBPF program is mostly packet parsing, LPM and map > > lookups, and 2x calls to the bpf_loop() helper. Currently no iterators, > > dynptrs, etc, but lots of switch-case blocks. > > > > I've started to test newer kernel versions in preparation to upgrade our > > stack from 6.1 LTS to 6.6 LTS to gain access to newer functionality and > > just for future proofing. However, when loading the BPF object code on a > > 6.6 kernel, the BPF verifier refuses to load the program that 6.1 > > accepts and runs well. > > > > This caught me by surprise, because I have witnessed our stack boot > > successfully on a 6.7 kernel. So, I've run veristat [0] on the exact > > same eBPF object file, compiled by clang17, but each time running on a > > different kernel version. Results fluctuate wildly! > > > > Results on 6.1.106: success: 53687 insns and 5114 states [1] > > Results on 6.6.52: failure: 1000001 insns and 39501 states [2] > > Results on 6.7.9: success: 131418 insns and 8839 states [3] > > Hi Alasdair, > > It might be the case that your issues with bpf_loop() are triggered by > the following commit: > - "bpf: verify callbacks as if they are called unknown number of times": > - ab5cfac139ab for 6.7.y > - b43550d7d58e for 6.6.y > - not backported to 6.1.y > > This commit is a correctness fix, w/o it bodies of the loop callbacks > were not checked exhaustively. But side effect of this fix is > significant verification time regression for some programs. > > Comparing BPF related commits in both branches (starting from merge > base, using script from the attachment) gives somewhat sporadic > results: > > Commits stats: > only in stable/linux-6.6.y : 50 > only in stable/linux-6.7.y : 96 > common : 74 > > Only in stable/linux-6.6.y: > ... > > Only in stable/linux-6.7.y: > ... > Of these only "bpf: Improve JEQ/JNE branch taken logic" from 6.7 > looks like an optimization, however it did not show any changes in > veristat data for selftests. I've also tried to look at this using a different script based on in-house tool and come to roughly the same conclusion on the 6.7 side. Nothing specifically strikes out to me in 6.7 that would explain the difference. OTOH 6.7.9 is _missing_ a fix that was backported to 6.6.52 -- e9a8e5a587ca "bpf: check bpf_func_state->callback_depth when pruning states". It was backported to 6.7.10, bu 6.7.9 doesn't have it yet. Since it prevents (improper) pruning, it could explain what we're seeing here. @Alasdair could you give 6.7.12 a quick try (I suppose that would be easier since you already tested 6.7.9) and see how it goes there? Additionally, here's v6.1.y branch, containing the "bpf: verify callbacks as if they are called unknown number of times" fix Eduard mentioned, https://github.com/shunghsiyu/linux/tree/stable/linux-6.1.y-callback-fixes-w-subprog-precision-v1 that I plan to submit (though long overdue). If @Alasdair could also test it out it is highly appreciated. Let me know if there's anything that would make things easier. Thanks, Shung-Hsi > => it's hard to say what's missing from 6.6 for your use-case. > > Maybe let's discuss options for your program optimization > with regards to verifier performance? > > Thanks, > Eduard > > P.S. hope I did not mess up the script.