Re: [RFC bpf-next] bpf, verifier: improve signed ranges inference for BPF_AND

Shung-Hsi Yu <shung-hsi.yu@xxxxxxxx> · Fri, 19 Jul 2024 16:32:44 +0800

On Wed, Jul 17, 2024 at 02:10:35PM GMT, Eduard Zingerman wrote:
> On Tue, 2024-07-16 at 22:52 +0800, Shung-Hsi Yu wrote:
> 
> [...]
> 
> > To allow verification of such instruction pattern, update
> > scalar*_min_max_and() to infer signed ranges directly from signed ranges
> > of the operands. With BPF_AND, the resulting value always gains more
> > unset '0' bit, thus it only move towards 0x0000000000000000. The
> > difficulty lies with how to deal with signs. While non-negative
> > (positive and zero) value simply grows smaller, a negative number can
> > grows smaller, but may also underflow and become a larger value.
> > 
> > To better address this situation we split the signed ranges into
> > negative range and non-negative range cases, ignoring the mixed sign
> > cases for now; and only consider how to calculate smax_value.
> > 
> > Since negative range & negative range preserve the sign bit, so we know
> > the result is still a negative value, thus it only move towards S64_MIN,
> > but never underflow, thus a save bet is to use a value in ranges that is
> > closet to 0, thus "max(dst_reg->smax_value, src->smax_value)". For
> > negative range & positive range the sign bit is always cleared, thus we
> > know the resulting is a non-negative, and only moves towards 0, so a
> > safe bet is to use smax_value of the non-negative range. Last but not
> > least, non-negative range & non-negative range is still a non-negative
> > value, and only moves towards 0; however same as the unsigned range
> > case, the maximum is actually capped by the lesser of the two, and thus
> > min(dst_reg->smax_value, src_reg->smax_value);
> > 
> > Listing out the above reasoning as a table (dst_reg abbreviated as dst,
> > src_reg abbreviated as src, smax_value abbrivated as smax) we get:
> > 
> >                         |                         src_reg
> >        smax = ?         +---------------------------+---------------------------
> >                         |        negative           |       non-negative
> > ---------+--------------+---------------------------+---------------------------
> >          | negative     | max(dst->smax, src->smax) |         src->smax
> > dst_reg  +--------------+---------------------------+---------------------------
> >          | non-negative |         dst->smax         | min(dst->smax, src->smax)
> > 
> > However this is quite complicated, luckily it can be simplified given
> > the following observations
> > 
> >     max(dst_reg->smax_value, src_reg->smax_value) >= src_reg->smax_value
> >     max(dst_reg->smax_value, src_reg->smax_value) >= dst_reg->smax_value
> >     max(dst_reg->smax_value, src_reg->smax_value) >= min(dst_reg->smax_value, src_reg->smax_value)
> > 
> > So we could substitute the cells in the table above all with max(...),
> > and arrive at:
> > 
> >                         |                         src_reg
> >       smax' = ?         +---------------------------+---------------------------
> >                         |        negative           |       non-negative
> > ---------+--------------+---------------------------+---------------------------
> >          | negative     | max(dst->smax, src->smax) | max(dst->smax, src->smax)
> > dst_reg  +--------------+---------------------------+---------------------------
> >          | non-negative | max(dst->smax, src->smax) | max(dst->smax, src->smax)
> > 
> > Meaning that simply using
> > 
> >   max(dst_reg->smax_value, src_reg->smax_value)
> > 
> > to calculate the resulting smax_value would work across all sign combinations.
> > 
> > 
> > For smin_value, we know that both non-negative range & non-negative
> > range and negative range & non-negative range both result in a
> > non-negative value, so an easy guess is to use the minimum non-negative
> > value, thus 0.
> > 
> >                         |                         src_reg
> >        smin = ?         +----------------------------+---------------------------
> >                         |          negative          |       non-negative
> > ---------+--------------+----------------------------+---------------------------
> >          | negative     |             ?              |             0
> > dst_reg  +--------------+----------------------------+---------------------------
> >          | non-negative |             0              |             0
> > 
> > This leave the negative range & negative range case to be considered. We
> > know that negative range & negative range always yield a negative value,
> > so a preliminary guess would be S64_MIN. However, that guess is too
> > imprecise to help with the r0 <<= 62, r0 s>>= 63, r0 &= -13 pattern
> > we're trying to deal with here.
> > 
> > This can be further improve with the observation that for negative range
> > & negative range, the smallest possible value must be one that has
> > longest _common_ most-significant set '1' bits sequence, thus we can use
> > min(dst_reg->smin_value, src->smin_value) as the starting point, as the
> > smaller value will be the one with the shorter most-significant set '1'
> > bits sequence. But that alone is not enough, as we do not know whether
> > rest of the bits would be set, so the safest guess would be one that
> > clear alls bits after the most-significant set '1' bits sequence,
> > something akin to bit_floor(), but for rounding to a negative power-of-2
> > instead.
> > 
> >     negative_bit_floor(0xffff000000000003) == 0xffff000000000000
> >     negative_bit_floor(0xf0ff0000ffff0000) == 0xf000000000000000
> >     negative_bit_floor(0xfffffb0000000000) == 0xfffff80000000000
> > 
> > With negative range & negative range solve, we now have:
> > 
> >                         |                         src_reg
> >        smin = ?         +----------------------------+---------------------------
> >                         |        negative            |       non-negative
> > ---------+--------------+----------------------------+---------------------------
> >          |   negative   |negative_bit_floor(         |             0
> >          |              |  min(dst->smin, src->smin))|
> > dst_reg  +--------------+----------------------------+---------------------------
> >          | non-negative |           0                |             0
> > 
> > This can be further simplied since min(dst->smin, src->smin) < 0 when both
> > dst_reg and src_reg have a negative range. Which means using
> > 
> >     negative_bit_floor(min(dst_reg->smin_value, src_reg->smin_value)
> > 
> > to calculate the resulting smin_value would work across all sign combinations.
> > 
> > Together these allows us to infer the signed range of the result of BPF_AND
> > operation using the signed range from its operands.
> 
> Hi Shung-Hsi,
> 
> This seems quite elegant.
> As an additional check, I did a simple brute-force for all possible
> ranges of 6-bit integers and bounds are computed safely.

Thanks for looking into this, as well as the complement.

Did took me quite awhile to try come up with a simple solution that
works just well enough without further complication, felt quite proud :)

> [...]