Re: WARNING in __mark_chain_precision

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Tue, 3 Jan 2023 10:27:52 -0800

On Mon, Jan 2, 2023 at 1:42 AM Hao Sun <sunhao.th@xxxxxxxxx> wrote:
>
>
>
> Yonghong Song <yhs@xxxxxxxx> 于2023年1月2日周一 03:20写道：
> >
> >
> >
> > On 12/30/22 1:44 AM, Hao Sun wrote:
> >>
> >>
> >> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> 于2022年12月30日周五 06:16写道：
> >>>
> >>> On Tue, Dec 27, 2022 at 9:24 PM Yonghong Song <yhs@xxxxxxxx> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 12/20/22 4:30 PM, Andrii Nakryiko wrote:
> >>>>> On Mon, Dec 19, 2022 at 11:13 AM <sdf@xxxxxxxxxx> wrote:
> >>>>>>
> >>>>>> On 12/19, Hao Sun wrote:
> >>>>>>> Hi,
> >>>>>>
> >>>>>>> The following backtracking bug can be triggered on the latest bpf-next and
> >>>>>>> Linux 6.1 with the C prog provided. I don't have enough knowledge about
> >>>>>>> this part in the verifier, don't know how to fix this.
> >>>>>>
> >>>>>> Maybe something related to commit be2ef8161572 ("bpf: allow precision
> >>>>>> tracking
> >>>>>> for programs with subprogs") and/or the related ones?
> >>>>>>
> >>>>>>
> >>>>>>> This can be reproduced on:
> >>>>>>
> >>>>>>> HEAD commit: 0e43662e61f2 tools/resolve_btfids: Use pkg-config to locate
> >>>>>>> libelf
> >>>>>>> git tree: bpf-next
> >>>>>>> console log: https://pastebin.com/raw/45hZ7iqm
> >>>>>>> kernel config: https://pastebin.com/raw/0pu1CHRm
> >>>>>>> C reproducer: https://pastebin.com/raw/tqsiezvT
> >>>>>>
> >>>>>>> func#0 @0
> >>>>>>> 0: R1=ctx(off=0,imm=0) R10=fp0
> >>>>>>> 0: (18) r2 = 0x8000000000000          ; R2_w=2251799813685248
> >>>>>>> 2: (18) r6 = 0xffff888027358000       ;
> >>>>>>> R6_w=map_ptr(off=0,ks=3032,vs=3664,imm=0)
> >>>>>>> 4: (18) r7 = 0xffff88802735a000       ;
> >>>>>>> R7_w=map_ptr(off=0,ks=156,vs=2624,imm=0)
> >>>>>>> 6: (18) r8 = 0xffff88802735e000       ;
> >>>>>>> R8_w=map_ptr(off=0,ks=2396,vs=76,imm=0)
> >>>>>>> 8: (18) r9 = 0x8e9700000000           ; R9_w=156779191205888
> >>>>>>> 10: (36) if w9 >= 0xffffffe3 goto pc+1
> >>>>>>> last_idx 10 first_idx 0
> >>>>>>> regs=200 stack=0 before 8: (18) r9 = 0x8e9700000000
> >>>>>>> 11: R9_w=156779191205888
> >>>>>>> 11: (85) call #0
> >>>>>>> 12: (cc) w2 s>>= w7
> >>>>>
> >>>>> w2 should have been set to NOT_INIT (because r1-r5 are clobbered by
> >>>>> calls) and rejected here as !read_ok (see check_reg_arg()) before
> >>>>> attempting to mark precision for r2. Can you please try to debug and
> >>>>> understand why that didn't happen here?
> >>>>
> >>>> The verifier is doing the right thing here and the 'call #0' does
> >>>> implicitly cleared r1-r5.
> >>>>
> >>>> So for 'w2 s>>= w7', since w2 is used, the verifier tries to find
> >>>> its definition by backtracing. It encountered 'call #0', which clears
> >>>
> >>> and that's what I'm saying is incorrect. Normally we'd get !read_ok
> >>> error because s>>= is both READ and WRITE on w2, which is
> >>> uninitialized after call instruction according to BPF ABI. And that's
> >>> what actually seems to happen correctly in my (simpler) tests locally.
> >>> But something is special about this specific repro that somehow either
> >>> bypasses this logic, or attempts to mark precision before we get to
> >>> that test. That's what we should investigate. I haven't tried to run
> >>> this specific repro locally yet, so can't tell for sure.
> >>>
> >>
> >> So, the reason why w2 is not marked as uninit is that the kfunc call in
> >> the BPF program is invalid, "call #0", imm is zero, right?
> >
> > Yes, "call #0" is invalid. As the code below
> >
> >> /* skip for now, but return error when we find this in
> > fixup_kfunc_call */
> >>  if (!insn->imm)
> >>  return 0;
> >
> > The error report will be delayed later in fixup_kfunc_call().
> >
> > static int fixup_kfunc_call(struct bpf_verifier_env *env, struct
> > bpf_insn *insn,
> >                             struct bpf_insn *insn_buf, int insn_idx,
> > int *cnt)
> > {
> >         const struct bpf_kfunc_desc *desc;
> >
> >         if (!insn->imm) {
> >                 verbose(env, "invalid kernel function call not
> > eliminated in verifier pass\n");
> >                 return -EINVAL;
> >         }
> >
> >
> >> In check_kfunc_call(), it skips this error temporarily:
> >>
> >> /* skip for now, but return error when we find this in fixup_kfunc_call */
> >>  if (!insn->imm)
> >>  return 0;
> >>
> >> So the kfunc call is the previous instruction before "w2 s>>= w7", this
> >> leads to the warning in backtrack_insn():
> >>
> >> /* regular helper call sets R0 */
> >> *reg_mask &= ~1;
> >> if (*reg_mask & 0x3f) {
> >>      /* if backtracing was looking for registers R1-R5
> >>      * they should have been found already.
> >>      */
> >>      verbose(env, "BUG regs %x\n", *reg_mask);
> >>      WARN_ONCE(1, "verifier backtracking bug”);
> >>      return -EFAULT;
> >> }
> >
> > The main triggering the backtrack_insn() is due to
> >
> >                         } else {
> >                                 /* scalar += pointer
> >                                  * This is legal, but we have to
> > reverse our
> >                                  * src/dest handling in computing the range
> >                                  */
> >                                 err = mark_chain_precision(env,
> > insn->dst_reg);
> >                                 if (err)
> >                                         return err;
> >                                 return adjust_ptr_min_max_vals(env, insn,
> >                                                                src_reg,
> > dst_reg);
> >                         }
> >
> >
> > unc#0 @0
> > 0: R1=ctx(off=0,imm=0) R10=fp0
> > 0: (18) r2 = 0x8000000000000          ; R2_w=2251799813685248
> > 2: (18) r6 = 0xffff888100d29000       ;
> > R6_w=map_ptr(off=0,ks=3032,vs=3664,imm=0)
> > 4: (18) r7 = 0xffff888100d2a000       ;
> > R7_w=map_ptr(off=0,ks=156,vs=2624,imm=0)
> > 6: (18) r8 = 0xffff888100d2ac00       ;
> > R8_w=map_ptr(off=0,ks=2396,vs=76,imm=0)
> > 8: (18) r9 = 0x8e9700000000           ; R9_w=156779191205888
> > 10: (36) if w9 >= 0xffffffe3 goto pc+1
> > last_idx 10 first_idx 0
> > regs=200 stack=0 before 8: (18) r9 = 0x8e9700000000
> > 11: R9_w=156779191205888
> > 11: (85) call #0
> > 12: (cc) w2 s>>= w7
> > last_idx 12 first_idx 12
> > parent didn't have regs=4 stack=0 marks: R1=ctx(off=0,imm=0)
> > R2_rw=P2251799813685248 R6_w=map_ptr(off=0,ks=3032,vs=3664,imm=0)
> > R7_rw=map_ptr(off=0,ks=156,vs=2624,imm=0) R8_w=map_ptr(off=0,ks=2396,v0
> > last_idx 11 first_idx 0
> > regs=4 stack=0 before 11: (85) call #0
> > BUG regs 4
> >
> > For insn 12, 'w2 s>>= w7', w2 is a scalar and w7 is a map_ptr. Hence,
> > based on the above verifier code, mark_chain_precision() is triggered.
> >
> > Not sure what is the purpose of this test. But to make it succeed,
> > first "call #0" need to change to a valid kfunc call, and second, you
> > might want to change 'w2 s>>= w7' to e.g., 'w9 s>>= w7' to avoid
> > precision tracking.
> >
>
> The purpose is not to make the test "succeed", the verifier temporarily
> skips the invalid kfunc insn "call #0", but this insn triggered a warning
> in backtrack_insn(), while it is supposed to reject the program either
> due to insn#12 32bit ptr alu or insn#11 invalid kfunc.
>
> Maybe something like the bellow, after applying the patch, the reproducer
> is rejected:
>
> func#0 @0
> 0: R1=ctx(off=0,imm=0) R10=fp0
> 0: (18) r2 = 0x8000000000000          ; R2_w=2251799813685248
> 2: (18) r6 = 0xffff88817d563000       ; R6_w=map_ptr(off=0,ks=3032,vs=3664,imm=0)
> 4: (18) r7 = 0xffff888171ee9000       ; R7_w=map_ptr(off=0,ks=156,vs=2624,imm=0)
> 6: (18) r8 = 0xffff888171ee8000       ; R8_w=map_ptr(off=0,ks=2396,vs=76,imm=0)
> 8: (18) r9 = 0x8e9700000000           ; R9_w=156779191205888
> 10: (36) if w9 >= 0xffffffe3 goto pc+1
> last_idx 10 first_idx 0
> regs=200 stack=0 before 8: (18) r9 = 0x8e9700000000
> 11: R9_w=156779191205888
> 11: (85) call #0
> 12: (cc) w2 s>>= w7
> last_idx 12 first_idx 12
> parent didn't have regs=4 stack=0 marks: R1=ctx(off=0,imm=0) R2_rw=P2251799813685248 R6_w=map_ptr(off=0,ks=3032,vs=3664,imm=0) R7_rw=map_ptr(off=0,ks=156,vs=2624,imm=0) R8_w=map_ptr(off=0,ks=2396,vs=76,imm=0) R9_w=156779191205888 R10=fp0
> last_idx 11 first_idx 0
> regs=4 stack=0 before 11: (85) call #0
> regs=4 stack=0 before 10: (36) if w9 >= 0xffffffe3 goto pc+1
> regs=4 stack=0 before 8: (18) r9 = 0x8e9700000000
> regs=4 stack=0 before 6: (18) r8 = 0xffff888171ee8000
> regs=4 stack=0 before 4: (18) r7 = 0xffff888171ee9000
> regs=4 stack=0 before 2: (18) r6 = 0xffff88817d563000
> regs=4 stack=0 before 0: (18) r2 = 0x8000000000000
> R2 32-bit pointer arithmetic prohibited
> processed 8 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 4a25375ebb0d..abc7e96d826f 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -2743,6 +2743,9 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx,
>                         *reg_mask |= sreg;
>         } else if (class == BPF_JMP || class == BPF_JMP32) {
>                 if (opcode == BPF_CALL) {
> +                       /* skip for now, should return error when we find this in fixup_kfunc_call */
> +                       if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL && insn->imm == 0)
> +                               return 0;

Makes sense to me. Please submit as an official patch
with s/return 0/return -ENOTSUPP/
Also 'skip for now' isn't quite correct here.
In check_kfunc_call() it's correct, since invalid kfunc with imm==0
could be eliminated during dead code elimination,
but since we're walking this insn here in backtrack_insn
the dead code elimination is not going to kick in.
So it's surely invalid kfunc call if we see it in backtrack_insn.
The comment should probably be something like:
/* kfunc with imm==0 is invalid and fixup_kfunc_call will catch
this error later. Make backtracking conservative with ENOTSUPP. */