Re: Verifier - wild instructions count fluctiations between versions?

Eduard Zingerman <eddyz87@xxxxxxxxx> · Mon, 23 Sep 2024 19:26:25 -0700

On Mon, 2024-09-23 at 19:35 +0100, Alasdair McWilliam wrote:
> Hello,
> 
> First post so please be gentle :-)
> 
> I've got an eBPF workload running on kernel 6.1 LTS and we're running great.
> 
> Use case actually is using eBPF in combination with XDP and AF_XDP for
> volumetric DDoS mitigation.
> 
> Makeup of the eBPF program is mostly packet parsing, LPM and map
> lookups, and 2x calls to the bpf_loop() helper. Currently no iterators,
> dynptrs, etc, but lots of switch-case blocks.
> 
> I've started to test newer kernel versions in preparation to upgrade our
> stack from 6.1 LTS to 6.6 LTS to gain access to newer functionality and
> just for future proofing. However, when loading the BPF object code on a
> 6.6 kernel, the BPF verifier refuses to load the program that 6.1
> accepts and runs well.
> 
> This caught me by surprise, because I have witnessed our stack boot
> successfully on a 6.7 kernel. So, I've run veristat [0] on the exact
> same eBPF object file, compiled by clang17, but each time running on a
> different kernel version. Results fluctuate wildly!
> 
> Results on 6.1.106: success: 53687 insns and 5114 states [1]
> Results on 6.6.52:  failure: 1000001 insns and 39501 states [2]
> Results on 6.7.9:   success: 131418 insns and 8839 states [3]

Hi Alasdair,

It might be the case that your issues with bpf_loop() are triggered by
the following commit:
- "bpf: verify callbacks as if they are called unknown number of times":
  - ab5cfac139ab for 6.7.y
  - b43550d7d58e for 6.6.y
  - not backported to 6.1.y

This commit is a correctness fix, w/o it bodies of the loop callbacks
were not checked exhaustively. But side effect of this fix is
significant verification time regression for some programs.

Comparing BPF related commits in both branches (starting from merge
base, using script from the attachment) gives somewhat sporadic
results:

  Commits stats:
    only in stable/linux-6.6.y    : 50
    only in stable/linux-6.7.y    : 96
    common                        : 74

  Only in stable/linux-6.6.y:
    ...

  Only in stable/linux-6.7.y:
    ...

Of these only "bpf: Improve JEQ/JNE branch taken logic" from 6.7
looks like an optimization, however it did not show any changes in
veristat data for selftests.

=> it's hard to say what's missing from 6.6 for your use-case.

Maybe let's discuss options for your program optimization
with regards to verifier performance?

Thanks,
Eduard

P.S. hope I did not mess up the script.

Attachment:
compare-bpf-commits-in-branches.sh

Description: application/shellscript