Re: [bpf-next PATCH 1/4] bpf: verifier track null pointer branch_taken with JNE and JEQ

John Fastabend <john.fastabend@xxxxxxxxx> · Mon, 18 May 2020 22:34:16 -0700

Andrii Nakryiko wrote:
> On Mon, May 18, 2020 at 1:05 PM John Fastabend <john.fastabend@xxxxxxxxx> wrote:
> >
> > Current verifier when considering which branch may be taken on a
> > conditional test with pointer returns -1 meaning either branch may
> > be taken. But, we track if pointers can be NULL by using dedicated
> > types for valid pointers (pointers that can not be NULL). For example,
> > we have PTR_TO_SOCK and PTR_TO_SOCK_OR_NULL to indicate a pointer
> > that is valid or not, PTR_TO_SOCK being the validated pointer type.
> >
> > We can then use this extra information when we encounter null tests
> > against pointers. Consider,
> >
> >   if (sk_ptr == NULL) ... else ...
> >
> > if the sk_ptr has type PTR_TO_SOCK we know the null check will fail
> > and the null branch can not be taken.
> >
> > In this patch we extend is_branch_taken to consider this extra
> > information and to return only the branch that will be taken. This
> > resolves a verifier issue reported with this C code,
> >
> >  sk = bpf_sk_lookup_tcp(skb, tuple, tuple_len, BPF_F_CURRENT_NETNS, 0);
> >  bpf_printk("sk=%d\n", sk ? 1 : 0);
> >  if (sk)
> >    bpf_sk_release(sk);
> >  return sk ? TC_ACT_OK : TC_ACT_UNSPEC;
> >
> > The generated asm then looks like this,
> >
> >  43: (85) call bpf_sk_lookup_tcp#84
> >  44: (bf) r6 = r0                    <- do the lookup and put result in r6
> >  ...                                 <- do some more work
> >  51: (55) if r6 != 0x0 goto pc+1     <- test sk ptr for printk use
> >  ...
> >  56: (85) call bpf_trace_printk#6
> >  ...
> >  61: (15) if r6 == 0x0 goto pc+1     <- do the if (sk) test from C code
> >  62: (b7) r0 = 0                     <- skip release because both branches
> >                                         are taken in verifier
> >  63: (95) exit
> >  Unreleased reference id=3 alloc_insn=43
> >
> 
> bpf_sk_release call in above assembler would be really nice for
> completeness. As written, this code never calls and never will call
> bpf_sk_release().
> 
> > In the verifier path the flow is,
> >
> >  51 -> 53 ... 61 -> 62
> >
> > Because at 51->53 jmp verifier promoted reg6 from type PTR_TO_SOCK_OR_NULL
> > to PTR_TO_SOCK but then at 62 we still test both paths ignoring that we
> 
> Seems like your description got a bit out of sync with the code above.
> There is no line 53, check is actually on line 61, not 62, etc. Can
> you please update it in your v2 as well?

Will do a rewrite.

> 
> > already promoted the type. So we incorrectly conclude an unreleased
> > reference. To fix this we add logic in is_branch_taken to test the
> > OR_NULL portion of the type and if its not possible for a pointer to
> > be NULL we can prune the branch taken where 'r6 == 0x0'.
> >
> > After the above additional logic is added the C code above passes
> > as expected.
> >
> > This makes the assumption that all pointer types PTR_TO_* that can be null
> > have an equivalent type PTR_TO_*_OR_NULL logic.
> >
> > Suggested-by: Alexei Starovoitov <ast@xxxxxxxxxx>
> > Reported-by: Andrey Ignatov <rdna@xxxxxx>
> > Signed-off-by: John Fastabend <john.fastabend@xxxxxxxxx>
> > ---
> >  0 files changed
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 180933f..8f576e2 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -393,6 +393,14 @@ static bool type_is_sk_pointer(enum bpf_reg_type type)
> >                 type == PTR_TO_XDP_SOCK;
> >  }
> >
> > +static bool reg_type_not_null(enum bpf_reg_type type)
> > +{
> > +       return type == PTR_TO_SOCKET ||
> > +               type == PTR_TO_TCP_SOCK ||
> > +               type == PTR_TO_MAP_VALUE ||
> > +               type == PTR_TO_SOCK_COMMON;
> 
> PTR_TO_BTF_ID should probably be here as well (we do have
> PTR_TO_BTF_ID_OR_NULL now).

OK.

> 
> > +}
> > +
> >  static bool reg_type_may_be_null(enum bpf_reg_type type)
> >  {
> >         return type == PTR_TO_MAP_VALUE_OR_NULL ||
> > @@ -1970,8 +1978,9 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int regno,
> >         if (regno >= 0) {
> >                 reg = &func->regs[regno];
> >                 if (reg->type != SCALAR_VALUE) {
> > -                       WARN_ONCE(1, "backtracing misuse");
> > -                       return -EFAULT;
> > +                       if (unlikely(!reg_type_not_null(reg->type)))
> > +                               WARN_ONCE(1, "backtracing misuse");
> > +                       return 0;
> 
> I think it's safer to instead add check in check_cond_jmp_op, in case
> branch is known, to only mark precision if register is not a non-null
> pointer. __mark_chain_precision is used in many places, so it's better
> to guard against this particular situation and leave warning for
> general case, IMO.

Sure.

> 
> >                 }
> >                 if (!reg->precise)
> >                         new_marks = true;
> > @@ -6306,8 +6315,26 @@ static int is_branch64_taken(struct bpf_reg_state *reg, u64 val, u8 opcode)
> >  static int is_branch_taken(struct bpf_reg_state *reg, u64 val, u8 opcode,
> >                            bool is_jmp32)
> >  {
> > -       if (__is_pointer_value(false, reg))
> > -               return -1;
> > +       if (__is_pointer_value(false, reg)) {
> > +               if (!reg_type_not_null(reg->type))
> > +                       return -1;
> > +
> > +               /* If pointer is valid tests against zero will fail so we can
> > +                * use this to direct branch taken.
> > +                */
> > +               switch (opcode) {
> > +               case BPF_JEQ:
> > +                       if (val == 0)
> > +                               return 0;
> > +                       return 1;
> 
> if val != 0, then we can't really tell whether point is equal to our
> scalar or not, right? What if we leaked pointer into a global
> variable, now we are checking against that stored value? It can go
> both ways. So unless I'm missing something, it should be -1 here.

Correct it should be -1 thanks. Probably worth adding a test for this case
as well.

> 
> > +               case BPF_JNE:
> > +                       if (val == 0)
> > +                               return 1;
> > +                       return 0;
> 
> same here, unless value we compare against is zero, we can't really
> tell for sure, so -1?
> 

Correct thanks.

> 
> > +               default:
> > +                       return -1;
> > +               }
> > +       }
> >
> >         if (is_jmp32)
> >                 return is_branch32_taken(reg, val, opcode);
> >