Re: [PATCH bpf] bpf: fix tracking of stack size for var-off access

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 21 Nov 2023 13:55:57 -0800

On Tue, Nov 21, 2023 at 12:30 PM Andrei Matei <andreimatei1@xxxxxxxxx> wrote:
>
> On Tue, Nov 21, 2023 at 12:16 PM Hao Sun <sunhao.th@xxxxxxxxx> wrote:
> >
> > On Tue, Nov 21, 2023 at 1:46 AM Andrii Nakryiko
> > <andrii.nakryiko@xxxxxxxxx> wrote:
> > >
> > >
> > > It *feels* like this stack depth update *and* growing allocated stack
> > > slots should happen somewhere in check_stack_access_within_bounds() or
> > > right after it. It shouldn't matter whether we read or write to the
> > > stack slot: either way that slot becomes part of the verifier state
> > > that we should take into account during state comparison. Eduard not
> > > so long ago added a change that allows reading STACK_INVALID slots, so
> > > it's completely valid to read something that was never written to (and
> > > so grow_stack_state() wasn't called for that slot, as it is
> > > implemented right now). So I think we should fix that.
> > >
> >
> > Agree. The following cases are currently confusing to me.
>
> Like Hao, I'm also confused about when reads from unitialized stack slots are
> supposed to be allowed and when they're not. For example, the "uninitialized
> stack1" test ([0]) seems to check that at least some type of uninitialized read
> is not permitted. The reason why the respective access is rejected and this
> test currently passes is the following check in check_stack_range_initialized:
> [1]. We summarily reject the access to slots beyond state->allocated_stack. If
> we were to not reject them there, and instead treat the slots as STACK_INVALID,
> then the test's access would be allowed just below, through the combination of
> env->allow_uninit_stack and `clobber`.

And that seems like a consistent and sane behavior if
bpf_allow_uninit_stack() is true, so I vote for fixing the test and
make the logic consistent.

>
> So, assuming that this tests (and a few others) are sane, Andrii's suggestion
> of calling grow_stack_state()/update_stack_depth() in
> check_stack_access_within_bounds() does not immediately work: doing so
> would change
> the behavior in check_stack_range_initialized() and allow the access.
>
> On the other hand, perhaps the test is not sane and the access should be
> permitted, in the spirit of allowing reads of uninitialized stack? Perhaps the
> different treatment of slots beyond state->allocated_stack and STACK_INVALID

yes, I think this divergence is not intentional, but maybe Eduard
remembers some other quirks and why it is what it is, let's see.

> slots is not intentional. Though, I am fairly confused about the idea of
> allowing such reads in general - don't they allow leaking of kernel memory from
> the uninit stack to user space? Even if restricted to root, is the leak ok?

This is guarded by bpf_allow_uninit_stack(), which check CAP_PERFMON.
Once BPF program has CAP_PERFMON, it can do bpf_probe_read_kernel()
and generally can leak whatever kernel memory. So that's why
STACK_INVALID is allowed to be read in such case. There is also
bpf_allow_ptr_leaks() which also checks CAP_PERFMON.

>
> To also state the obvious - I'm happy to continue working on this patch and fix
> a bug I've caused. However, if someone who actually knows the deal wants to
> take over, I'm certainly not attached to anything :).

It would be great if you can finish this, absolutely! Thanks!

>
> [0] https://github.com/torvalds/linux/blob/98b1cc82c4affc16f5598d4fa14b1858671b2263/tools/testing/selftests/bpf/progs/verifier_basic_stack.c#L28-L44
> [1] https://github.com/torvalds/linux/blob/98b1cc82c4affc16f5598d4fa14b1858671b2263/kernel/bpf/verifier.c#L7280
>
>
> >
> > The verifier accepts the following program, which goes from #4 to #8
> > and directly read the stack at runtime without any previous write:
> > func#0 @0
> > 0: R1=ctx() R10=fp0
> > 0: (bf) r6 = r10                      ; R6_w=fp0 R10=fp0
> > 1: (85) call bpf_get_prandom_u32#7    ; R0_w=scalar()
> > 2: (bf) r3 = r0                       ; R0_w=scalar(id=1) R3_w=scalar(id=1)
> > 3: (bf) r8 = r0                       ; R0_w=scalar(id=1) R8_w=scalar(id=1)
> > 4: (4e) if w0 & w3 goto pc+3          ; R0_w=scalar(id=1) R3_w=scalar(id=1)
> > 5: (63) *(u32 *)(r6 -196) = r3        ; R3_w=scalar(id=1) R6_w=fp0
> > fp-200=mmmm????
> > 6: (18) r7 = 0x19                     ; R7=25
> > 8: (61) r7 = *(u32 *)(r6 -200)        ; R6=fp0
> > R7_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
> > fp-200=mmmm????
> > 9: (95) exit
> >
> > from 4 to 8: safe
> > verification time 358 usec
> > stack depth 200
> > processed 10 insns (limit 1000000) max_states_per_insn 0 total_states
> > 1 peak_states 1 mark_read 1
> >
> > The state is pruned, because of this:
> > static bool stacksafe(...)
> >          ....
> >          if (env->allow_uninit_stack &&
> >              old->stack[spi].slot_type[i % BPF_REG_SIZE] == STACK_MISC)
> >              continue;
> >
> > Yet, the sample direct read would be rejected:
> >
> > func#0 @0
> > 0: R1=ctx() R10=fp0
> > 0: (bf) r6 = r10                      ; R6_w=fp0 R10=fp0
> > 1: (61) r7 = *(u32 *)(r6 -200)
> > invalid read from stack R6 off=-200 size=4
> >
> > Eduard, you added support for reading uninit slots, should we also add something
> > like the following:
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 8c2d31aa3d31..aa861d2da240 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -6446,7 +6446,7 @@ static int check_stack_slot_within_bounds(int off,
> >  {
> >         int min_valid_off;
> >
> > -       if (t == BPF_WRITE)
> > +       if (t == BPF_WRITE || env->allow_uninit_stack)
> >                 min_valid_off = -MAX_BPF_STACK;
> >         else
> >                 min_valid_off = -state->allocated_stack;

as I mentioned above, yes, I think the behavior should be consistent
and such accesses should be allowed, just as if it was STACK_INVALID