Re: [PATCH bpf-next] selftests/bpf: Workaround iters/iter_arr_with_actual_elem_count failure when -mcpu=cpuv4

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 9 Jul 2024 11:12:05 -0700

On Tue, Jul 9, 2024 at 10:22 AM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Tue, Jul 9, 2024 at 9:45 AM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Mon, Jul 8, 2024 at 7:04 PM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Mon, Jul 8, 2024 at 3:12 PM Andrii Nakryiko
> > > <andrii.nakryiko@xxxxxxxxx> wrote:
> > > >
> > > > On Mon, Jul 8, 2024 at 3:11 PM Andrii Nakryiko
> > > > <andrii.nakryiko@xxxxxxxxx> wrote:
> > > > >
> > > > > On Mon, Jul 8, 2024 at 2:31 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
> > > > > >
> > > > > > On Mon, 2024-07-08 at 13:18 -0700, Alexei Starovoitov wrote:
> > > > > >
> > > > > > [...]
> > > > > >
> > > > > > > > the 32bit_sign_ext will indicate the register r1 is from 32bit sign extension, so once w1 range is refined, the upper 32bit can be recalculated.
> > > > > > > >
> > > > > > > > Can we avoid 32bit_sign_exit in the above? Let us say we have
> > > > > > > >    r1 = ...;  R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff), R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
> > > > > > > >    if w1 < w6 goto pc+4
> > > > > > > > where r1 achieves is trange through other means than 32bit sign extension e.g.
> > > > > > > >    call bpf_get_prandom_u32;
> > > > > > > >    r1 = r0;
> > > > > > > >    r1 <<= 32;
> > > > > > > >    call bpf_get_prandom_u32;
> > > > > > > >    r1 |= r0;  /* r1 is 64bit random number */
> > > > > > > >    r2 = 0xffffffff80000000 ll;
> > > > > > > >    if r1 s< r2 goto end;
> > > > > > > >    if r1 s> 0x7fffFFFF goto end; /* after this r1 range (smin=0xffffffff80000000,smax=0x7fffffff) */
> > > > > > > >    if w1 < w6 goto end;
> > > > > > > >    ...  <=== w1 range [0,31]
> > > > > > > >         <=== but if we have upper bit as 0xffffffff........, then the range will be
> > > > > > > >         <=== [0xffffffff0000001f, 0xffffffff00000000] and this range is not possible compared to original r1 range.
> > > > > > >
> > > > > > > Just rephrasing for myself...
> > > > > > > Because smin=0xffffffff80000000 if upper 32-bit == 0xffffFFFF
> > > > > > > then lower 32-bit has to be negative.
> > > > > > > and because we're doing unsigned compare w1 < w6
> > > > > > > and w6 is less than 80000000
> > > > > > > we can conclude that upper bits are zero.
> > > > > > > right?
> > > > > >
> > > > > > Sorry, could you please explain this a bit more.
> > > > >
> > > > > Yep, also curious.
> > > > >
> > > > > But meanwhile, I'm intending to update bpf_for() to something like
> > > > > below to avoid this code generation pattern:
> > > > >
> > > >
> > > > Well, thank you, Gmail, for messed up formatting. See [0] for properly
> > > > formatted diff.
> > > >
> > > >   [0] https://gist.github.com/anakryiko/08a4374259469803af4ea2185296b0cb
> > >
> > > Not that simple. It needs sizeof(start)==8 extra hack like bpf_cmp().
> >
> > I'm forgetting the details, but I feel like sizeof() == 4 was
> > important for bpf_cmp() to compare wX registers instead of always
> > comparing Rx. But in this case I think we are fine with always working
> > with full 64-bit Rx registers. Or is there some correctness issue
> > involved?
>
> it's a correctness issue.
> sizeof()==8 has to go via "r" otherwise it's a silent truncation
> by llvm.
>
> > > And the same with 'end'. So it will get just as ugly.
> > > Let's make the verifier smarter instead.
> >
> > Oh, absolutely, let's. But that doesn't solve the problem of someone
> > using bpf_for() with the latest Clang on an older kernel that doesn't
> > yet have this smartness, does it? Which is why I want to mitigate that
> > on the bpf_for() side in addition to improvements on the verifier
> > side.
>
> There is no urgency here.
> Here it's a combination of the very latest llvm trunk and -mcpu=v4.
> -mcpu=v4 is rare. Users can continue with -mcpu=v3 or released llvm.

Ok, fair enough, I can hold off on this for now (though eventually
people will switch and will want to run on old kernels, so the problem
doesn't really go away).

But really it's quite annoying that we don't have a proper way to just
say "give me whatever register makes sense so I can embed it in the
assembly and do something simple with it".

Yonghong, Eduard, is this something that can be figured out to let us
do straightforward pieces of embedded assembly like bpf_cmp() or this
bpf_for() hack?