On Tue, Oct 31, 2023 at 8:37 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > On Fri, 2023-10-27 at 11:13 -0700, Andrii Nakryiko wrote: > > Add smin/smax derivation from appropriate umin/umax values. Previously the > > logic was surprisingly asymmetric, trying to derive umin/umax from smin/smax > > (if possible), but not trying to do the same in the other direction. A simple > > addition to __reg64_deduce_bounds() fixes this. > > > > Added also generic comment about u64/s64 ranges and their relationship. > > Hopefully that helps readers to understand all the bounds deductions > > a bit better. > > > > Acked-by: Shung-Hsi Yu <shung-hsi.yu@xxxxxxxx> > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > Acked-by: Eduard Zingerman <eddyz87@xxxxxxxxx> > > Nice comment, thank you. I noticed two typos, see below. > > > --- > > kernel/bpf/verifier.c | 70 +++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 70 insertions(+) > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > index 857d76694517..bf4193706744 100644 > > --- a/kernel/bpf/verifier.c > > +++ b/kernel/bpf/verifier.c > > @@ -2358,6 +2358,76 @@ static void __reg32_deduce_bounds(struct bpf_reg_state *reg) > > > > static void __reg64_deduce_bounds(struct bpf_reg_state *reg) > > { > > + /* If u64 range forms a valid s64 range (due to matching sign bit), > > + * try to learn from that. Let's do a bit of ASCII art to see when > > + * this is happening. Let's take u64 range first: > > + * > > + * 0 0x7fffffffffffffff 0x8000000000000000 U64_MAX > > + * |-------------------------------|--------------------------------| > > + * > > + * Valid u64 range is formed when umin and umax are anywhere in this > > + * range [0, U64_MAX] and umin <= umax. u64 is simple and > > + * straightforward. Let's where s64 range maps to this simple [0, > > + * U64_MAX] range, annotated below the line for comparison: > > Nit: this sentence sounds a bit weird, probably some word is missing > between "let's" and "where". > I don't know what's going on here, I wasn't drunk when I wrote this and I don't remember it being so incoherent :) Will re-read and try to make it clearer. > > + * > > + * 0 0x7fffffffffffffff 0x8000000000000000 U64_MAX > > + * |-------------------------------|--------------------------------| > > + * 0 S64_MAX S64_MIN -1 > > + * > > + * So s64 values basically start in the middle and then are contiguous > > + * to the right of it, wrapping around from -1 to 0, and then > > + * finishing as S64_MAX (0x7fffffffffffffff) right before S64_MIN. > > + * We can try drawing more visually continuity of u64 vs s64 values as > > + * mapped to just actual hex valued range of values. > > + * > > + * u64 start u64 end > > + * _______________________________________________________________ > > + * / \ > > + * 0 0x7fffffffffffffff 0x8000000000000000 U64_MAX > > + * |-------------------------------|--------------------------------| > > + * 0 S64_MAX S64_MIN -1 > > + * / \ > > + * >------------------------------ -------------------------------> > > + * s64 continues... s64 end s64 start s64 "midpoint" > > + * > > + * What this means is that in general, we can't always derive > > + * something new about u64 from any random s64 range, and vice versa. > > + * But we can do that in two particular cases. One is when entire > > + * u64/s64 range is *entirely* contained within left half of the above > > + * diagram or when it is *entirely* contained in the right half. I.e.: > > + * > > + * |-------------------------------|--------------------------------| > > + * ^ ^ ^ ^ > > + * A B C D > > + * > > + * [A, B] and [C, D] are contained entirely in their respective halves > > + * and form valid contiguous ranges as both u64 and s64 values. [A, B] > > + * will be non-negative both as u64 and s64 (and in fact it will be > > + * identical ranges no matter the signedness). [C, D] treated as s64 > > + * will be a range of negative values, while in u64 it will be > > + * non-negative range of values larger than 0x8000000000000000. > > + * > > + * Now, any other range here can't be represented in both u64 and s64 > > + * simultaneously. E.g., [A, C], [A, D], [B, C], [B, D] are valid > > + * contiguous u64 ranges, but they are discontinuous in s64. [B, C] > > + * in s64 would be properly presented as [S64_MIN, C] and [B, S64_MAX], > > + * for example. Similarly, valid s64 range [D, A] (going from negative > > + * to positive values), would be two separate [D, U64_MAX] and [0, A] > > + * ranges as u64. Currently reg_state can't represent two segments per > > + * numeric domain, so in such situations we can only derive maximal > > + * possible range ([0, U64_MAX] for u64, and [S64_MIN, S64_MAX) for s64). > ^ > Nit: missing bracket > it's actually a typo, ) -> ], which is now fixed as well, thanks > > + * > > + * So we use these facts to derive umin/umax from smin/smax and vice > > + * versa only if they stay within the same "half". This is equivalent > > + * to checking sign bit: lower half will have sign bit as zero, upper > > + * half have sign bit 1. Below in code we simplify this by just > > + * casting umin/umax as smin/smax and checking if they form valid > > + * range, and vice versa. Those are equivalent checks. > > + */ > > + if ((s64)reg->umin_value <= (s64)reg->umax_value) { > > + reg->smin_value = max_t(s64, reg->smin_value, reg->umin_value); > > + reg->smax_value = min_t(s64, reg->smax_value, reg->umax_value); > > + } > > /* Learn sign from signed bounds. > > * If we cannot cross the sign boundary, then signed and unsigned bounds > > * are the same, so combine. This works even in the negative case, e.g. > > >