On Tue, Jul 16, 2024 at 03:05:11PM GMT, Xu Kuohai wrote: > On 7/15/2024 11:29 PM, Shung-Hsi Yu wrote: > > Cc Harishankar Vishwanathan, Prof. Srinivas Narayana and Prof. Santosh > > Nagarakatte, and Matan Shachnai, whom have recently work on > > scalar*_min_max_and(); also dropping LSM/FS related mails from Cc since > > it's a bit long and I'm not sure whether the mailing list will reject > > due to too many email in Cc. > > > > On Thu, Jul 11, 2024 at 07:38:24PM GMT, Xu Kuohai wrote: > > > With lsm return value check, the no-alu32 version test_libbpf_get_fd_by_id_opts > > > is rejected by the verifier, and the log says: > > > > > > 0: R1=ctx() R10=fp0 > > > ; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27 > > > 0: (b7) r0 = 0 ; R0_w=0 > > > 1: (79) r2 = *(u64 *)(r1 +0) > > > func 'bpf_lsm_bpf_map' arg0 has btf_id 916 type STRUCT 'bpf_map' > > > 2: R1=ctx() R2_w=trusted_ptr_bpf_map() > > > ; if (map != (struct bpf_map *)&data_input) @ test_libbpf_get_fd_by_id_opts.c:29 > > > 2: (18) r3 = 0xffff9742c0951a00 ; R3_w=map_ptr(map=data_input,ks=4,vs=4) > > > 4: (5d) if r2 != r3 goto pc+4 ; R2_w=trusted_ptr_bpf_map() R3_w=map_ptr(map=data_input,ks=4,vs=4) > > > ; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27 > > > 5: (79) r0 = *(u64 *)(r1 +8) ; R0_w=scalar() R1=ctx() > > > ; if (fmode & FMODE_WRITE) @ test_libbpf_get_fd_by_id_opts.c:32 > > > 6: (67) r0 <<= 62 ; R0_w=scalar(smax=0x4000000000000000,umax=0xc000000000000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xc000000000000000)) > > > 7: (c7) r0 s>>= 63 ; R0_w=scalar(smin=smin32=-1,smax=smax32=0) > > > ; @ test_libbpf_get_fd_by_id_opts.c:0 > > > 8: (57) r0 &= -13 ; R0_w=scalar(smax=0x7ffffffffffffff3,umax=0xfffffffffffffff3,smax32=0x7ffffff3,umax32=0xfffffff3,var_off=(0x0; 0xfffffffffffffff3)) > > > ; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27 > > > 9: (95) exit > > > > > > And here is the C code of the prog. > > > > > > SEC("lsm/bpf_map") > > > int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) > > > { > > > if (map != (struct bpf_map *)&data_input) > > > return 0; > > > > > > if (fmode & FMODE_WRITE) > > > return -EACCES; > > > > > > return 0; > > > } > > > > > > It is clear that the prog can only return either 0 or -EACCESS, and both > > > values are legal. > > > > > > So why is it rejected by the verifier? > > > > > > The verifier log shows that the second if and return value setting > > > statements in the prog is optimized to bitwise operations "r0 s>>= 63" > > > and "r0 &= -13". The verifier correctly deduces that the value of > > > r0 is in the range [-1, 0] after verifing instruction "r0 s>>= 63". > > > But when the verifier proceeds to verify instruction "r0 &= -13", it > > > fails to deduce the correct value range of r0. > > > > > > 7: (c7) r0 s>>= 63 ; R0_w=scalar(smin=smin32=-1,smax=smax32=0) > > > 8: (57) r0 &= -13 ; R0_w=scalar(smax=0x7ffffffffffffff3,umax=0xfffffffffffffff3,smax32=0x7ffffff3,umax32=0xfffffff3,var_off=(0x0; 0xfffffffffffffff3)) > > > > > > So why the verifier fails to deduce the result of 'r0 &= -13'? > > > > > > The verifier uses tnum to track values, and the two ranges "[-1, 0]" and > > > "[0, -1ULL]" are encoded to the same tnum. When verifing instruction > > > "r0 &= -13", the verifier erroneously deduces the result from > > > "[0, -1ULL] AND -13", which is out of the expected return range > > > [-4095, 0]. > > > > > > As explained by Eduard in [0], the clang transformation that generates this > > > pattern is located in DAGCombiner::SimplifySelectCC() method (see [1]). > > ... > > > As suggested by Eduard and Andrii, this patch makes a special case > > > for source or destination register of '&=' operation being in > > > range [-1, 0]. > > ... > > > > Been wonder whether it possible for a more general approach ever since I > > saw the discussion back in April. I think I've finally got something. > > > > The problem we face here is that the tightest bound for the [-1, 0] case > > was tracked with signed ranges, yet the BPF verifier looses knowledge of > > them all too quickly in scalar*_min_max_and(); knowledge of previous > > signed ranges were not used at all to derive the outcome of signed > > ranges after BPF_AND. > > > > static void scalar_min_max_and(...) { > > ... > > if ((s64)dst_reg->umin_value <= (s64)dst_reg->umax_value) { > > dst_reg->smin_value = dst_reg->umin_value; > > dst_reg->smax_value = dst_reg->umax_value; > > } else { > > dst_reg->smin_value = S64_MIN; > > dst_reg->smax_value = S64_MAX; > > } > > ... > > } > > > > This is indeed the root cause. > > > So looks like its time to be nobody[1] and try to teach BPF verifier how > > track signed ranges when ANDing two (possibly) negative numbers. Luckily > > bitwise AND is comparatively easier to do than other bitwise operations: > > non-negative range & non-negative range is always non-negative, > > non-negative range & negative range is still always non-negative, and > > negative range & negative range is always negative. > > > > Right, only bitwise ANDing two negatives yields to a negative result. > > > smax_value is straight forwards, we can just do > > > > max(dst_reg->smax_value, src_reg->smax_value) > > > > which works across all sign combinations. Technically for non-negative & > > non-negative we can use min() instead of max(), but the non-negative & > > non-negative case should be handled pretty well by the unsigned ranges > > already; it seems simpler to let such knowledge flows from unsigned > > ranges to signed ranges during reg_bounds_sync(). Plus we are not wrong > > for non-negative & non-negative by using max(), just imprecise, so no > > correctness/soundness issue here. > > > > I think this is correct, since in two's complement, more '1' bits means > more large, regardless of sign, and bitwise AND never generates more '1' > bits. > > > smin_value is the tricker one, but doable with > > > > masked_negative(min(dst_reg->smin_value, src_reg->smin_value)) > > > > where masked_negative(v) basically just clear all bits after the most > > significant unset bit, effectively rounding a negative value down to a > > negative power-of-2 value, and returning 0 for non-negative values. E.g. > > for some 8-bit, negative value > > > > masked_negative(0b11101001) == 0b11100000 > > > > Ah, it's really tricky. Seems it's the longest high '1' bits sequence > in both operands. This '1' bits should remain unchanged by the bitwise > AND operation. So this sequence must be in the result, making it the > minimum possible value. > > > This can be done with a tweaked version of "Round up to the next highest > > power of 2"[2], > > > > /* Invert the bits so the first unset bit can be propagated with |= */ > > v = ~v; > > /* Now propagate the first (previously unset, now set) bit to the > > * trailing positions */ > > v |= v >> 1; > > v |= v >> 2; > > v |= v >> 4; > > ... > > v |= v >> 32; /* Assuming 64-bit */ > > /* Propagation done, now invert again */ > > v = ~v; > > > > Again, we technically can do better if we take sign bit into account, > > but deriving smin_value this way should still be correct/sound across > > different sign combinations, and overall should help us derived [-16, 0] > > from "[-1, 0] AND -13", thus preventing BPF verifier from rejecting the > > program. > > > > --- > > > > Alternatively we can employ a range-splitting trick (think I saw this in > > [3]) that allow us to take advantage of existing tnum_and() by splitting > > the signed ranges into two if the range crosses the sign boundary (i.e. > > contains both non-negative and negative values), one range will be > > [smin, U64_MAX], the other will be [0, smax]. This way we get around > > tnum's weakness of representing [-1, 0] as [0, U64_MAX]. > > > > if (src_reg->smin_value < 0 && src_reg->smax_value >= 0) { > > src_lower = tnum_range(src_reg->smin_value, U64_MAX); > > src_higher = tnum_range(0, src_reg->smax_value); > > } else { > > src_lower = tnum_range(src_reg->smin_value, src_reg->smax_value); > > src_higher = tnum_range(src_reg->smin_value, src_reg->smax_value); > > } > > > > if (dst_reg->smin_value < 0 && dst_reg->smax_value >= 0) { > > dst_lower = tnum_range(dst_reg->smin_value, U64_MAX); > > dst_higher = tnum_range(0, dst_reg->smax_value); > > } else { > > dst_lower = tnum_range(dst_reg->smin_value, dst_reg->smax_value); > > dst_higher = tnum_range(dst_reg->smin_value, dst_reg->smax_value); > > } > > > > lower = tnum_and(src_lower, dst_lower); > > higher = tnum_and(src_higher, dst_higher); > > dst->smin_value = lower.value; > > dst->smax_value = higher.value | higher.mask; > > This looks even more tricky... Indeed, and I think the above is still wrong because it did not proper set smin_value to S64_MIN when needed. > > Personally I like the first method better as it is simpler yet still > > does the job well enough. I'll work on that in the next few days and see > > if it actually works. > > This really sounds great. Thank you for the excellent work! Sent RFC in sibling thread. I think it would be better if the patch was included as part of your series. But let's see what the other think of it first.