Re: [PATCH bpf-next] selftests/bpf: Workaround iters/iter_arr_with_actual_elem_count failure when -mcpu=cpuv4

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 8 Jul 2024 15:11:00 -0700

On Mon, Jul 8, 2024 at 2:31 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Mon, 2024-07-08 at 13:18 -0700, Alexei Starovoitov wrote:
>
> [...]
>
> > > the 32bit_sign_ext will indicate the register r1 is from 32bit sign extension, so once w1 range is refined, the upper 32bit can be recalculated.
> > >
> > > Can we avoid 32bit_sign_exit in the above? Let us say we have
> > >    r1 = ...;  R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff), R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
> > >    if w1 < w6 goto pc+4
> > > where r1 achieves is trange through other means than 32bit sign extension e.g.
> > >    call bpf_get_prandom_u32;
> > >    r1 = r0;
> > >    r1 <<= 32;
> > >    call bpf_get_prandom_u32;
> > >    r1 |= r0;  /* r1 is 64bit random number */
> > >    r2 = 0xffffffff80000000 ll;
> > >    if r1 s< r2 goto end;
> > >    if r1 s> 0x7fffFFFF goto end; /* after this r1 range (smin=0xffffffff80000000,smax=0x7fffffff) */
> > >    if w1 < w6 goto end;
> > >    ...  <=== w1 range [0,31]
> > >         <=== but if we have upper bit as 0xffffffff........, then the range will be
> > >         <=== [0xffffffff0000001f, 0xffffffff00000000] and this range is not possible compared to original r1 range.
> >
> > Just rephrasing for myself...
> > Because smin=0xffffffff80000000 if upper 32-bit == 0xffffFFFF
> > then lower 32-bit has to be negative.
> > and because we're doing unsigned compare w1 < w6
> > and w6 is less than 80000000
> > we can conclude that upper bits are zero.
> > right?
>
> Sorry, could you please explain this a bit more.

Yep, also curious.

But meanwhile, I'm intending to update bpf_for() to something like
below to avoid this code generation pattern:

diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index 305c62817dd3..86dc854a713b 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -394,7 +394,18 @@ extern void bpf_iter_num_destroy(struct
bpf_iter_num *it) __weak __ksym;
                /* iteration step */
                         \
                int *___t = bpf_iter_num_next(&___it);
                         \
                /* termination and bounds check */
                         \
-               (___t && ((i) = *___t, (i) >= (start) && (i) <
(end)));                         \
+               (___t && ({
                         \
+                       __label__ l_false;
                         \
+                       _Bool ok = 0;
                         \
+                       (i) = *___t;
                         \
+                       asm volatile goto ("
                         \
+                               if %[_i] s< %[_start] goto
%l[l_false];                         \
+                               if %[_i] s>= %[_end] goto %l[l_false];
                         \
+                       " :: [_i]"r"(i), [_start]"ri"(start),
[_end]"ri"(end) :: l_false);      \
+                       ok = 1;
                         \
+               l_false:
                         \
+                       ok;
                         \
+               }));
                         \
        });
                         \
 )
 #endif /* bpf_for */

This produces this code for cpuv4:

    1294:       85 10 00 00 ff ff ff ff call -0x1
    1295:       15 00 10 00 00 00 00 00 if r0 == 0x0 goto +0x10 <LBB34_4>
    1296:       61 01 00 00 00 00 00 00 r1 = *(u32 *)(r0 + 0x0)
    1297:       c5 01 0e 00 00 00 00 00 if r1 s< 0x0 goto +0xe <LBB34_4>
    1298:       7d 71 0d 00 00 00 00 00 if r1 s>= r7 goto +0xd <LBB34_4>
    1299:       bf 11 20 00 00 00 00 00 r1 = (s32)r1

> The w1 < w6 comparison only infers information about sub-registers.
> So the range for the full register r1 would still have 0xffffFFFF
> for upper bits => r1 += r2 would fail.
> What do I miss?
>
> The non-cpuv4 version of the program does non-sign-extended load:
>
> 14: (61) r1 = *(u32 *)(r0 +0)   ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4)
>                                   R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
> 15: (ae) if w1 < w6 goto pc+4   ; R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
>                                   R6=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f))
>
> Tbh, it looks like LLVM deleted some info that could not be recovered
> in this instance.
>
> >
> > >         <=== so the only possible way for upper 32bit range is 0.
> > > end:
> > >
> > > Therefore, looks like we do not need 32bit_sign_exit. Just from
> > > R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff)
> > > with refined range in true path of 'if w1 < w6 goto ...',
> > > we can further refine w1 range properly.
> >
> > yep. looks like it.
> > We can hard code this special logic for this specific smin/smax pair,
> > but the gut feel is that we can generalize it further.
> >
>