Re: [PATCH bpf-next v1 07/13] bpf: Fix partial dynptr stack slot reads/writes

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Sat, 5 Nov 2022 04:32:56 +0530

On Sat, Nov 05, 2022 at 03:44:53AM IST, Andrii Nakryiko wrote:
> On Thu, Nov 3, 2022 at 7:07 AM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> >
> > On Sat, Oct 22, 2022 at 5:08 AM Kumar Kartikeya Dwivedi
> > <memxor@xxxxxxxxx> wrote:
> > >
> > > On Sat, Oct 22, 2022 at 04:20:28AM IST, Joanne Koong wrote:
> > > > [...]
> > > >
> > > > thanks for your work on this (and on the rest of the stack, which I'm
> > > > still working on reviewing)
> > > >
> > > > Regarding writes leading to partial dynptr stack slots, I'm regretting
> > > > not having the verifier flat-out reject this in the first place
> > > > (instead of it being allowed but internally the stack slot gets marked
> > > > as invalid) - I think it overall ends up being more confusing to end
> > > > users, where there it's not obvious at all that writing to the dynptr
> > > > on the stack automatically invalidates it. I'm not sure whether it's
> > > > too late from a public API behavior perspective to change this or not.
> > >
> > > It would be incorrect to reject writes into dynptrs whose reference is not
> > > tracked by the verifier (so bpf_dynptr_from_mem), because the compiler would be
> > > free to reuse the stack space for some other variable when the local dynptr
> > > variable's lifetime ends, and the verifier would have no way to know when the
> > > variable went out of scope.
> > >
> > > I feel it is also incorrect to refuse bpf_dynptr_from_mem where unref dynptr
> > > already exists as well. Right now it sees STACK_DYNPTR in the slot_type and
> > > fails. But consider something like this:
> > >
> > > void prog(void)
> > > {
> > >         {
> > >                 struct bpf_dynptr ptr;
> > >                 bpf_dynptr_from_mem(...);
> > >                 ...
> > >         }
> > >
> > >         ...
> > >
> > >         {
> > >                 struct bpf_dynptr ptr;
> > >                 bpf_dynptr_from_mem(...);
> > >         }
> > > }
> > >
> > > The program is valid, but if ptr in both scopes share the same stack slots, the
> > > call in the second scope would fail because verifier would see STACK_DYNPTR in
> > > slot_type.
>
> I don't think compiler is allowed to reuse the same stack slot for
> those two ptrs, because we are passing a pointer to it into a
> black-box bpf_dynptr_from_mem() function, so kernel can't assume that
> this slot is free to be reused just because no one is accessing it
> after bpf_dynptr_from_mem (I think?)
>

At the C level, once the lifetime of the object ends upon execution going out of
its enclosing scope, even if its pointer escaped, the ending of the lifetime
implicitly invalidates any such pointers (as per the abstract machine), so the
compiler is well within its rights (because using such a pointer any more is UB)
to reuse the same storage for another object.

E.g. https://godbolt.org/z/Wcvzfbfbr

For the following:

struct bpf_dynptr {
	unsigned long a, b;
};

extern void bpf_dynptr_func(struct bpf_dynptr *);
extern void bpf_dynptr_ro(const struct bpf_dynptr *);

int main(void)
{
	{
		struct bpf_dynptr d2;

		bpf_dynptr_func(&d2);
		bpf_dynptr_ro(&d2);
	}
	{
		struct bpf_dynptr d3;

		bpf_dynptr_func(&d3);
		bpf_dynptr_ro(&d3);
	}
}

clang produces:

main:                                   # @main
        r6 = r10
        r6 += -16
        call bpf_dynptr_func
        call bpf_dynptr_ro
        r6 = r10
        r6 += -16
        call bpf_dynptr_func
        call bpf_dynptr_ro
        exit

> Would it make sense to allow *optional* bpf_dynptr_free (of is it
> bpf_dynptr_put, not sure) for non-reference-tracked dynptrs if indeed
> we wanted to reuse the same stack variable for multiple dynptrs,
> though?

I think it's fine to simply overwrite the type of object when reusing the same
stack slot.

For some precedence:

This is what compilers essentially do for effective(C)/dynamic(C++) types, for
instance GCC will simply overwrite the effective type of the object (even for
declared objects, even though standard only permits overwriting it for allocated
objects) whenever you do a store using a lvalue of some type T or memcpy
(transferring the effective type to dst).

For a more concrete analogy, Storage Reuse in
https://en.cppreference.com/w/cpp/language/lifetime describes how placement new
simply reuses storage of trivially destructible objects without requiring the
destructor to be called. Since it is C++ if you replace storage of non-trivial
object it must be switched back to the same type before the runtime calls the
original destructors emitted implicitly for declared type.

So in BPF terms, local dynptr (dynptr_from_mem) is trivially destructible, but
ringbuf dynptr requires its 'destructor' to be called before storage is reused.

There are two choices in the second case, either complain when such a ringbuf
dynptr's storage is reused without calling its release function, or the verifier
will complain later when seeing an unreleased reference. I think complaining
early is better as later the user has to correlate insn_idx of leaked reference.
The program is going to fail in both cases anyway, so it makes sense to give a
better error back to the user.