Re: [PATCH v5 bpf-next 07/23] bpf: improve deduction of 64-bit bounds from 32-bit bounds

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 31 Oct 2023 13:33:59 -0700

On Tue, Oct 31, 2023 at 1:26 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Fri, Oct 27, 2023 at 11:17 AM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote:
> >
> > Add a few interesting cases in which we can tighten 64-bit bounds based
> > on newly learnt information about 32-bit bounds. E.g., when full u64/s64
> > registers are used in BPF program, and then eventually compared as
> > u32/s32. The latter comparison doesn't change the value of full
> > register, but it does impose new restrictions on possible lower 32 bits
> > of such full registers. And we can use that to derive additional full
> > register bounds information.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > ---
> >  kernel/bpf/verifier.c | 47 +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 47 insertions(+)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 38d21d0e46bd..768247e3d667 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -2535,10 +2535,57 @@ static void __reg64_deduce_bounds(struct bpf_reg_state *reg)
> >         }
> >  }
> >
> > +static void __reg_deduce_mixed_bounds(struct bpf_reg_state *reg)
> > +{
> > +       /* Try to tighten 64-bit bounds from 32-bit knowledge, using 32-bit
> > +        * values on both sides of 64-bit range in hope to have tigher range.
> > +        * E.g., if r1 is [0x1'00000000, 0x3'80000000], and we learn from
> > +        * 32-bit signed > 0 operation that s32 bounds are now [1; 0x7fffffff].
> > +        * With this, we can substitute 1 as low 32-bits of _low_ 64-bit bound
> > +        * (0x100000000 -> 0x100000001) and 0x7fffffff as low 32-bits of
> > +        * _high_ 64-bit bound (0x380000000 -> 0x37fffffff) and arrive at a
> > +        * better overall bounds for r1 as [0x1'000000001; 0x3'7fffffff].
> > +        * We just need to make sure that derived bounds we are intersecting
> > +        * with are well-formed ranges in respecitve s64 or u64 domain, just
> > +        * like we do with similar kinds of 32-to-64 or 64-to-32 adjustments.
> > +        */
> > +       __u64 new_umin, new_umax;
> > +       __s64 new_smin, new_smax;
> > +
> > +       /* u32 -> u64 tightening, it's always well-formed */
> > +       new_umin = (reg->umin_value & ~0xffffffffULL) | reg->u32_min_value;
> > +       new_umax = (reg->umax_value & ~0xffffffffULL) | reg->u32_max_value;
> > +       reg->umin_value = max_t(u64, reg->umin_value, new_umin);
> > +       reg->umax_value = min_t(u64, reg->umax_value, new_umax);
> > +
> > +       /* s32 -> u64 tightening, s32 should be a valid u32 range (same sign) */
> > +       if ((u32)reg->s32_min_value <= (u32)reg->s32_max_value) {
> > +               new_umin = (reg->umin_value & ~0xffffffffULL) | (u32)reg->s32_min_value;
> > +               new_umax = (reg->umax_value & ~0xffffffffULL) | (u32)reg->s32_max_value;
> > +               reg->umin_value = max_t(u64, reg->umin_value, new_umin);
> > +               reg->umax_value = min_t(u64, reg->umax_value, new_umax);
> > +       }
> > +
> > +       /* u32 -> s64 tightening, u32 range embedded into s64 preserves range validity */
> > +       new_smin = (reg->smin_value & ~0xffffffffULL) | reg->u32_min_value;
> > +       new_smax = (reg->smax_value & ~0xffffffffULL) | reg->u32_max_value;
> > +       reg->smin_value = max_t(s64, reg->smin_value, new_smin);
> > +       reg->smax_value = min_t(s64, reg->smax_value, new_smax);
> > +
> > +       /* s32 -> s64 tightening, check that s32 range behaves as u32 range */
> > +       if ((u32)reg->s32_min_value <= (u32)reg->s32_max_value) {
>
> There is no typo in this check, right?

I don't think so.

> To make sure somebody doesn't ask this question again can we
> combine the same 'if'-s into one?
> In order:
> u32->u64
> u32->s64
> if ((u32)reg->s32_min_value <= (u32)reg->s32_max_value) {
>   s32->u64
>   s32->s64
> }
> ?
> imo will be easier to follow and the same end result?

yep, absolutely, will regroup