On Fri, 2023-10-27 at 11:13 -0700, Andrii Nakryiko wrote: > Add smin/smax derivation from appropriate umin/umax values. Previously the > logic was surprisingly asymmetric, trying to derive umin/umax from smin/smax > (if possible), but not trying to do the same in the other direction. A simple > addition to __reg64_deduce_bounds() fixes this. > > Added also generic comment about u64/s64 ranges and their relationship. > Hopefully that helps readers to understand all the bounds deductions > a bit better. > > Acked-by: Shung-Hsi Yu <shung-hsi.yu@xxxxxxxx> > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> Acked-by: Eduard Zingerman <eddyz87@xxxxxxxxx> Nice comment, thank you. I noticed two typos, see below. > --- > kernel/bpf/verifier.c | 70 +++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 70 insertions(+) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 857d76694517..bf4193706744 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -2358,6 +2358,76 @@ static void __reg32_deduce_bounds(struct bpf_reg_state *reg) > > static void __reg64_deduce_bounds(struct bpf_reg_state *reg) > { > + /* If u64 range forms a valid s64 range (due to matching sign bit), > + * try to learn from that. Let's do a bit of ASCII art to see when > + * this is happening. Let's take u64 range first: > + * > + * 0 0x7fffffffffffffff 0x8000000000000000 U64_MAX > + * |-------------------------------|--------------------------------| > + * > + * Valid u64 range is formed when umin and umax are anywhere in this > + * range [0, U64_MAX] and umin <= umax. u64 is simple and > + * straightforward. Let's where s64 range maps to this simple [0, > + * U64_MAX] range, annotated below the line for comparison: Nit: this sentence sounds a bit weird, probably some word is missing between "let's" and "where". > + * > + * 0 0x7fffffffffffffff 0x8000000000000000 U64_MAX > + * |-------------------------------|--------------------------------| > + * 0 S64_MAX S64_MIN -1 > + * > + * So s64 values basically start in the middle and then are contiguous > + * to the right of it, wrapping around from -1 to 0, and then > + * finishing as S64_MAX (0x7fffffffffffffff) right before S64_MIN. > + * We can try drawing more visually continuity of u64 vs s64 values as > + * mapped to just actual hex valued range of values. > + * > + * u64 start u64 end > + * _______________________________________________________________ > + * / \ > + * 0 0x7fffffffffffffff 0x8000000000000000 U64_MAX > + * |-------------------------------|--------------------------------| > + * 0 S64_MAX S64_MIN -1 > + * / \ > + * >------------------------------ -------------------------------> > + * s64 continues... s64 end s64 start s64 "midpoint" > + * > + * What this means is that in general, we can't always derive > + * something new about u64 from any random s64 range, and vice versa. > + * But we can do that in two particular cases. One is when entire > + * u64/s64 range is *entirely* contained within left half of the above > + * diagram or when it is *entirely* contained in the right half. I.e.: > + * > + * |-------------------------------|--------------------------------| > + * ^ ^ ^ ^ > + * A B C D > + * > + * [A, B] and [C, D] are contained entirely in their respective halves > + * and form valid contiguous ranges as both u64 and s64 values. [A, B] > + * will be non-negative both as u64 and s64 (and in fact it will be > + * identical ranges no matter the signedness). [C, D] treated as s64 > + * will be a range of negative values, while in u64 it will be > + * non-negative range of values larger than 0x8000000000000000. > + * > + * Now, any other range here can't be represented in both u64 and s64 > + * simultaneously. E.g., [A, C], [A, D], [B, C], [B, D] are valid > + * contiguous u64 ranges, but they are discontinuous in s64. [B, C] > + * in s64 would be properly presented as [S64_MIN, C] and [B, S64_MAX], > + * for example. Similarly, valid s64 range [D, A] (going from negative > + * to positive values), would be two separate [D, U64_MAX] and [0, A] > + * ranges as u64. Currently reg_state can't represent two segments per > + * numeric domain, so in such situations we can only derive maximal > + * possible range ([0, U64_MAX] for u64, and [S64_MIN, S64_MAX) for s64). ^ Nit: missing bracket > + * > + * So we use these facts to derive umin/umax from smin/smax and vice > + * versa only if they stay within the same "half". This is equivalent > + * to checking sign bit: lower half will have sign bit as zero, upper > + * half have sign bit 1. Below in code we simplify this by just > + * casting umin/umax as smin/smax and checking if they form valid > + * range, and vice versa. Those are equivalent checks. > + */ > + if ((s64)reg->umin_value <= (s64)reg->umax_value) { > + reg->smin_value = max_t(s64, reg->smin_value, reg->umin_value); > + reg->smax_value = min_t(s64, reg->smax_value, reg->umax_value); > + } > /* Learn sign from signed bounds. > * If we cannot cross the sign boundary, then signed and unsigned bounds > * are the same, so combine. This works even in the negative case, e.g.