On Mon, Jul 8, 2024 at 2:31 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > On Mon, 2024-07-08 at 13:18 -0700, Alexei Starovoitov wrote: > > [...] > > > > the 32bit_sign_ext will indicate the register r1 is from 32bit sign extension, so once w1 range is refined, the upper 32bit can be recalculated. > > > > > > Can we avoid 32bit_sign_exit in the above? Let us say we have > > > r1 = ...; R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff), R6_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) > > > if w1 < w6 goto pc+4 > > > where r1 achieves is trange through other means than 32bit sign extension e.g. > > > call bpf_get_prandom_u32; > > > r1 = r0; > > > r1 <<= 32; > > > call bpf_get_prandom_u32; > > > r1 |= r0; /* r1 is 64bit random number */ > > > r2 = 0xffffffff80000000 ll; > > > if r1 s< r2 goto end; > > > if r1 s> 0x7fffFFFF goto end; /* after this r1 range (smin=0xffffffff80000000,smax=0x7fffffff) */ > > > if w1 < w6 goto end; > > > ... <=== w1 range [0,31] > > > <=== but if we have upper bit as 0xffffffff........, then the range will be > > > <=== [0xffffffff0000001f, 0xffffffff00000000] and this range is not possible compared to original r1 range. > > > > Just rephrasing for myself... > > Because smin=0xffffffff80000000 if upper 32-bit == 0xffffFFFF > > then lower 32-bit has to be negative. > > and because we're doing unsigned compare w1 < w6 > > and w6 is less than 80000000 > > we can conclude that upper bits are zero. > > right? > > Sorry, could you please explain this a bit more. Yep, also curious. But meanwhile, I'm intending to update bpf_for() to something like below to avoid this code generation pattern: diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h index 305c62817dd3..86dc854a713b 100644 --- a/tools/lib/bpf/bpf_helpers.h +++ b/tools/lib/bpf/bpf_helpers.h @@ -394,7 +394,18 @@ extern void bpf_iter_num_destroy(struct bpf_iter_num *it) __weak __ksym; /* iteration step */ \ int *___t = bpf_iter_num_next(&___it); \ /* termination and bounds check */ \ - (___t && ((i) = *___t, (i) >= (start) && (i) < (end))); \ + (___t && ({ \ + __label__ l_false; \ + _Bool ok = 0; \ + (i) = *___t; \ + asm volatile goto (" \ + if %[_i] s< %[_start] goto %l[l_false]; \ + if %[_i] s>= %[_end] goto %l[l_false]; \ + " :: [_i]"r"(i), [_start]"ri"(start), [_end]"ri"(end) :: l_false); \ + ok = 1; \ + l_false: \ + ok; \ + })); \ }); \ ) #endif /* bpf_for */ This produces this code for cpuv4: 1294: 85 10 00 00 ff ff ff ff call -0x1 1295: 15 00 10 00 00 00 00 00 if r0 == 0x0 goto +0x10 <LBB34_4> 1296: 61 01 00 00 00 00 00 00 r1 = *(u32 *)(r0 + 0x0) 1297: c5 01 0e 00 00 00 00 00 if r1 s< 0x0 goto +0xe <LBB34_4> 1298: 7d 71 0d 00 00 00 00 00 if r1 s>= r7 goto +0xd <LBB34_4> 1299: bf 11 20 00 00 00 00 00 r1 = (s32)r1 > The w1 < w6 comparison only infers information about sub-registers. > So the range for the full register r1 would still have 0xffffFFFF > for upper bits => r1 += r2 would fail. > What do I miss? > > The non-cpuv4 version of the program does non-sign-extended load: > > 14: (61) r1 = *(u32 *)(r0 +0) ; R0=rdonly_mem(id=3,ref_obj_id=2,sz=4) > R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) > 15: (ae) if w1 < w6 goto pc+4 ; R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) > R6=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=32,var_off=(0x0; 0x3f)) > > Tbh, it looks like LLVM deleted some info that could not be recovered > in this instance. > > > > > > <=== so the only possible way for upper 32bit range is 0. > > > end: > > > > > > Therefore, looks like we do not need 32bit_sign_exit. Just from > > > R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) > > > with refined range in true path of 'if w1 < w6 goto ...', > > > we can further refine w1 range properly. > > > > yep. looks like it. > > We can hard code this special logic for this specific smin/smax pair, > > but the gut feel is that we can generalize it further. > > >