Re: Recent eBPF verifier rejects program that was accepted by 6.8 eBPF verifier

Maxim Mikityanskiy <maxim@xxxxxxxxxxxxx> · Wed, 20 Nov 2024 22:33:11 +0200

Hi Francis,

On Fri, 15 Nov 2024 at 22:59, Francis Deslauriers
<francis.deslauriers@xxxxxxxxxxxxxxx> wrote:
> I added a stripped down libbpf reproducer based on the libbpf-bootstrap repo to
> showcase this issue (see below). Note that if I change the RULE_LIST_SIZE from
> 380 to 30, the verifier accepts the program.
>
> I bisected the issue down to this commit:
>         commit 8ecfc371d829bfed75e0ef2cab45b2290b982f64
>         Author: Maxim Mikityanskiy <maxim@xxxxxxxxxxxxx>
>         Date:   Mon Jan 8 22:52:02 2024 +0200
>
>         bpf: Assign ID to scalars on spill

This commit was part of a series with a few new features that are
indeed expected to raise the complexity with some programs. To
compensate for it, a patch by Eduard was submitted within the same
series. Could you please test your program on this kernel commit?

6efbde200bf3 bpf: Handle scalar spill vs all MISC in stacksafe()

I.e. whether it passes or fails, and If it fails, what's the biggest
RULE_LIST_SIZE that passes. Let's see if Eduard's patch helps
partially or doesn't address this issue at all (or helps fully, but
there is another regression after his commit).

> It's my understanding that a eBPF program that is accepted by one
> version of the verifier should be accepted on all subsequent versions.

That's basically the goal. Obviously, some new features will increase
the verification complexity, but we are trying to compensate for it or
to make it insignificant.

For example, when I introduced my changes (with Eduard's patch), I
tested the complexity before and after on the set of BPF programs that
included kernel selftests and Cilium:

https://patchwork.kernel.org/project/netdevbpf/cover/20240108205209.838365-1-maxtram95@xxxxxxxxx/
https://patchwork.kernel.org/project/netdevbpf/cover/20240127175237.526726-1-maxtram95@xxxxxxxxx/

As you can see, Eduard's patch really helped with most regressions,
and the whole picture looked good enough. Apparently (if this series
is indeed the culprit), your program uses some pattern that wasn't
covered by the set of programs that I checked.

> I'm investigating how this commit affects how instructions are counted
>  and why it leads to such a drastic change for this particular program.

I'd guess, most likely, something inside the loop body became more
complex to verify, and it's repeated 380 times, further increasing the
total complexity.

I wonder where 380 comes from. Is it just the maximum number of
iterations that the old verifier could handle? Or does it have a
deeper meaning?

Is bpf_loop an option for you?

> I wanted to share my findings early in case someone has any hints for me.
>
> To reproduce, use the following file as a drop in replacement of
> libbpf-boostrap's examples/c/tc.bpf.c:

I reproduced the issue on 6.11.6, the highest RULE_LIST_SIZE that
works for me is 35, but I currently lack time to take a deeper look.
Do you have any new findings in the meanwhile?