Re: [Bug Report] bpf: incorrectly pruning runtime execution path

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 14 Dec 2023 16:06:07 -0800

On Thu, Dec 14, 2023 at 8:26 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Thu, 2023-12-14 at 17:10 +0200, Eduard Zingerman wrote:
> > [...]
> > > The reason why retval checks fails is that the way you disable dead
> > > code removal pass is not complete. Disable opt_remove_dead_code()
> > > just prevent the instruction #30 from being removed, but also note
> > > opt_hard_wire_dead_code_branches(), which convert conditional jump
> > > into unconditional one, so #30 is still skipped.
> > >
> > > > Note that I tried this test with two functions:
> > > > - bpf_get_current_cgroup_id, with this function I get retval 2, not 4 :)
> > > > - bpf_get_prandom_u32, with this function I get a random retval each time.
> > > >
> > > > What is the expectation when 'bpf_get_current_cgroup_id' is used?
> > > > That it is some known (to us) number, but verifier treats it as unknown scalar?
> > > >
> > >
> > > Either one would work, but to make #30 always taken, r0 should be
> > > non-zero.
> >
> > Oh, thank you, I made opt_hard_wire_dead_code_branches() a noop,
> > replaced r0 = 0x4 by r0 /= 0 and see "divide error: 0000 [#1] PREEMPT SMP NOPTI"
> > error in the kernel log on every second or third run of the test
> > (when using prandom).
> >
> > Working to minimize the test case will share results a bit later.
>
> Here is the minimized version of the test:
> https://gist.github.com/eddyz87/fb4d3c7d5aabdc2ae247ed73fefccd32
>
> If executed several times: ./test_progs -vvv -a verifier_and/pruning_test
> it eventually crashes VM with the following error:
>
> [    2.039066] divide error: 0000 [#1] PREEMPT SMP NOPTI
>                ...
> [    2.039987] Call Trace:
> [    2.039987]  <TASK>
> [    2.039987]  ? die+0x36/0x90
> [    2.039987]  ? do_trap+0xdb/0x100
> [    2.039987]  ? bpf_prog_32cfdb2c00b08250_pruning_test+0x4d/0x60
> [    2.039987]  ? do_error_trap+0x7d/0x110
> [    2.039987]  ? bpf_prog_32cfdb2c00b08250_pruning_test+0x4d/0x60
> [    2.039987]  ? exc_divide_error+0x38/0x50
> [    2.039987]  ? bpf_prog_32cfdb2c00b08250_pruning_test+0x4d/0x60
> [    2.039987]  ? asm_exc_divide_error+0x1a/0x20
> [    2.039987]  ? bpf_prog_32cfdb2c00b08250_pruning_test+0x4d/0x60
> [    2.039987]  bpf_test_run+0x1b5/0x350
> [    2.039987]  ? bpf_test_run+0x115/0x350
>                ...
>
> I'll continue debugging this a bit later today.
>

Great, thanks a lot, Eduard. Let's paste the program here for discussion:

$ cat progs/verifier_blah.c
// SPDX-License-Identifier: GPL-2.0
/* Copyright (C) 2023 SUSE LLC */
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include "bpf_misc.h"

SEC("socket")
__success
__flag(BPF_F_TEST_STATE_FREQ)
__retval(42)
__naked void pruning_test(void)
{
        asm volatile (
        "   call %[bpf_get_prandom_u32];\n"
        "   r7 = r0;\n"
        "   call %[bpf_get_prandom_u32];\n"
        "   r8 = 2;\n"
        "   if r0 > 1 goto 1f;\n"
        "   r8 = r7;\n"
        "1: r5 = r8;\n"
        "   if r8 >= r0 goto 2f;\n"
        "   r8 += r8;\n"
        "   if r5 == 0 goto 2f;\n"
        "   r0 /= 0;\n"
        "2: r0 = 42;\n"
        "   exit;\n"
        :
        : __imm(bpf_get_prandom_u32)
        : __clobber_all);
}

char _license[] SEC("license") = "GPL";

If we look at relevant portion of verifier log for `if r5 == 0` we see this:

9: (15) if r5 == 0x0 goto pc+1
mark_precise: frame0: last_idx 9 first_idx 7 subseq_idx -1
mark_precise: frame0: regs=r5,r7 stack= before 8: (0f) r8 += r8
mark_precise: frame0: regs=r5,r7 stack= before 7: (3d) if r8 >= r0 goto pc+3

^^ Note here that we only have r5 and r7, not r8.

mark_precise: frame0: parent state regs=r5,r7 stack=:
R0_rw=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0;
0x1)) R5_rw=Pscalar(id=1) R7_w=Pscalar(id=1) R8_rw=scalar(id=1)
R10=fp0
mark_precise: frame0: last_idx 6 first_idx 0 subseq_idx 7
mark_precise: frame0: regs=r5,r7,r8 stack= before 6: (bf) r5 = r8
mark_precise: frame0: regs=r7,r8 stack= before 5: (bf) r8 = r7
mark_precise: frame0: regs=r7 stack= before 4: (25) if r0 > 0x1 goto pc+1
mark_precise: frame0: regs=r7 stack= before 3: (b7) r8 = 2
mark_precise: frame0: regs=r7 stack= before 2: (85) call bpf_get_prandom_u32#7
mark_precise: frame0: regs=r7 stack= before 1: (bf) r7 = r0
mark_precise: frame0: regs=r0 stack= before 0: (85) call bpf_get_prandom_u32#7

Note above that r0 in `if r8 >= r0` is not marked as precise because
at that point we don't know that r8 should be precise (due to us
"forgetting" linked ID information).

Now, let's comment out the "r8 += r8" instruction so that we preserve
linkage between r5 and r8 (and also r7, but that's less relevant
here).

8: (15) if r5 == 0x0 goto pc+1
mark_precise: frame0: last_idx 8 first_idx 7 subseq_idx -1
mark_precise: frame0: regs=r5,r7,r8 stack= before 7: (3d) if r8 >= r0 goto pc+2

^^ Here note how we seek for r5,r7, *and* r8 to be precise...

mark_precise: frame0: parent state regs=r0,r5,r7,r8 stack=:

... which leads to us adding r0 to the set due to that `if r8 >= r0`
instruction.

(btw, I was wrong yesterday, we do have logic to mark *both* registers
of conditional jump if at least one of them is precise, so seems like
we handle that well)

 R0_rw=Pscalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0;
0x1)) R5_rw=Pscalar(id=1) R7_w=Pscalar(id=1) R8_rw=Pscalar(id=1)
R10=fp0
mark_precise: frame0: last_idx 6 first_idx 0 subseq_idx 7
mark_precise: frame0: regs=r0,r5,r7,r8 stack= before 6: (bf) r5 = r8
mark_precise: frame0: regs=r0,r7,r8 stack= before 5: (bf) r8 = r7
mark_precise: frame0: regs=r0,r7 stack= before 4: (25) if r0 > 0x1 goto pc+1
mark_precise: frame0: regs=r0,r7 stack= before 3: (b7) r8 = 2
mark_precise: frame0: regs=r0,r7 stack= before 2: (85) call
bpf_get_prandom_u32#7
mark_precise: frame0: regs=r7 stack= before 1: (bf) r7 = r0
mark_precise: frame0: regs=r0 stack= before 0: (85) call bpf_get_prandom_u32#7

So all in all, I still think that the root cause is what I said
yesterday. We don't preserve information about linked registers at the
per-instruction level, but we should.

If you agree with the analysis, we can start discussing what's the
best way to fix this.