Re: [RFC PATCH bpf-next v1 1/2] bpf: Explore PTR_TO_STACK as R0 for bpf_dynptr_slice_rdwr

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Fri, 21 Feb 2025 02:13:02 +0100

On Fri, 21 Feb 2025 at 02:02, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Thu, Feb 20, 2025 at 4:37 PM Kumar Kartikeya Dwivedi
> <memxor@xxxxxxxxx> wrote:
> >
> > On Fri, 21 Feb 2025 at 01:27, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
> > >
> > > On Thu, Feb 20, 2025 at 7:41 AM Kumar Kartikeya Dwivedi
> > > <memxor@xxxxxxxxx> wrote:
> > > >
> > > > On Thu, 20 Feb 2025 at 01:13, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
> > > > >
> > > > > On Wed, Feb 19, 2025 at 10:10 AM Kumar Kartikeya Dwivedi
> > > > > <memxor@xxxxxxxxx> wrote:
> > > > > >
> > > > > > On Wed, 19 Feb 2025 at 18:41, Alexei Starovoitov
> > > > > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Wed, Feb 19, 2025 at 4:51 AM Kumar Kartikeya Dwivedi
> > > > > > > <memxor@xxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > For the bpf_dynptr_slice_rdwr kfunc, the verifier may return a pointer
> > > > > > > > to the underlying packet (if the requested slice is linear), or copy out
> > > > > > > > the data to the buffer passed into the kfunc. The verifier performs
> > > > > > > > symbolic execution assuming the returned value is a PTR_TO_MEM of a
> > > > > > > > certain size (passed into the kfunc), and ensures reads and writes are
> > > > > > > > within bounds.
> > > > > > >
> > > > > > > sounds like
> > > > > > > check_kfunc_mem_size_reg() -> check_mem_size_reg() ->
> > > > > > > check_helper_mem_access()
> > > > > > >    case PTR_TO_STACK:
> > > > > > >       check_stack_range_initialized()
> > > > > > >          clobber = true
> > > > > > >              if (clobber) {
> > > > > > >                   __mark_reg_unknown(env, &state->stack[spi].spilled_ptr);
> > > > > > >
> > > > > > > is somehow broken?
> > > > > > >
> > > > > > > ohh. It might be:
> > > > > > > || !is_kfunc_arg_optional(meta->btf, buff_arg)
> > > > > > >
> > > > > > > This bit is wrong then.
> > > > > > > When arg is not-null check_kfunc_mem_size_reg() should be called.
> > > > > > > The PTR_TO_STACK abuse is a small subset of issues
> > > > > > > if check_kfunc_mem_size_reg() is indeed not called.
> > > > > >
> > > > > > The condition looks ok to me.
> > > > > >
> > > > > > The condition to do check_mem_size_reg is !null || !opt.
> > > > > > So when it's null, and it's opt, it will be skipped.
> > > > > > When it's null, and it's not opt, the check will happen.
> > > > > > When arg is not-null, the said function is called, opt does not matter then.
> > > > > > So the stack slots are marked misc.
> > > > > >
> > > > > > In our case we're not passing a NULL pointer in the selftest.
> > > > > >
> > > > > > The problem occurs once we spill to that slot _after_ the call, and
> > > > > > then do a write through returned mem pointer.
> > > > > >
> > > > > > The final few lines from the selftest do the dirty thing, where r0 is
> > > > > > aliasing fp-8, and r1 = 0.
> > > > > >
> > > > > > + *(u64 *)(r10 - 8) = r8; \
> > > > > > + *(u64 *)(r0 + 0) = r1; \
> > > > > > + r8 = *(u64 *)(r10 - 8); \
> > > > > > + r0 = *(u64 *)(r8 + 0); \
> > > > > >
> > > > > > The write through r0 must re-mark the stack, but the verifier doesn't
> > > > > > know it's pointing to the stack.
> > > > > > push_stack was the conceptually cleaner/simpler fix, but it apparently
> > > > > > isn't good enough.
> > > > > >
> > > > > > Remarking the stack on write to PTR_TO_MEM, or invalidating PTR_TO_MEM
> > > > > > when r8 is spilled to fp-8 first time after the call are two options.
> > > > > > Both have some concerns (first, the misaligned stack access is not
> > > > > > caught, second PTR_TO_MEM may outlive stack frame).
> > > > >
> > > > > Reading the description of the problem my first instinct was to have
> > > > > stack slots with associated obj_ref_id for such cases and when
> > > > > something writes into that stack slot, invalidate everything with that
> > > > > obj_ref_id. So that's probably what you mean by invalidating
> > > > > PTR_TO_MEM, right?
> > > > >
> > > > > Not sure I understand what "PTR_TO_STACK with mem_size" (that Alexei
> > > > > mentioned in another email) means, though, so hard to compare.
> > > > >
> > > >
> > > > Invalidation is certainly one option. The one Alexei mentioned was
> > > > where we discussed bounding how much data can be read through the
> > > > PTR_TO_STACK (similar to PTR_TO_MEM), and mark r0 as PTR_TO_STACK.
> > > > This ends up combining the constraints of both types of pointers (it
> > > > may as well be called PTR_TO_STACK_OR_MEM) without forking paths.
> > >
> > > Yeah, PTR_TO_STACK_OR_MEM would be more precise. But how does that
> > > help with this problem? Not sure I follow the idea of the solution
> > > (but I can wait for patches to be posted).
> >
> > The reason for push_stack was to ensure writes through the returned
> > pointer take effect on stack state.
> > By keeping it PTR_TO_STACK, we get that behavior.
> > However, in the other path of this patch, the verifier would verify
> > the pointer as PTR_TO_MEM, with a bounded mem_size.
> > Thus it would not allow writing beyond a certain size.
> > If we simply set r0 to PTR_TO_STACK now, it would possibly allow going
> > beyond the size that was passed to kfunc.
> > E.g. say buffer was fp-24 and size was 8, we can also modify fp-8
> > through it and not just fp-16.
> > Adding an additional mem_size field and checking it (when set) for
> > PTR_TO_STACK allows us to ensure we cannot write beyond 8 bytes
> > through r0=fp-24.
>
> yeah, this part is clear, of course, but I was actually more worried
> about handling the following situation:
>

For simplicity, let's just forget about stack_or_mem, and assume stack
pointer with this extra mem_size.
I haven't written the code yet but the idea would be to limit writing
to [fp-N, fp-N+mem_size) through it.
A screening check before we pass through to the normal stack access
handling code.

>
> struct bpf_dynptr dptr;
> void *slice;
> union {
>     char buf[32];
>     void *map_value;
> } blah;
>
> bpf_dynptr_from_skb(..., &dptr);

At this point we mark dptr.

>
> slice = bpf_dynptr_slice_rdwr(&dptr, 0, &blah.buf, sizeof(blah.buf));
> if (!slice) return 0; /* whatever */
>

slice is PTR_TO_STACK w/ mem_size = 32.

We mark the bytes for buf as STACK_MISC at this point.

> /* now slice points to blah.{buf,map_value}, right? */
>
> map_value = bpf_map_lookup_elem(&some_map, ...);
> if (!map_value) return 0;

The 8-byte slot for map_value in these 32-bytes becomes map_value.
Rest remains misc.

>
> /* we shouldn't allow working with slice at this point */
> *(u64 *)slice = 0xdeadbeef; /* overwrite map_value */

And because slice is PTR_TO_STACK, we will notice the overwrite of map
value with scalar.

>
> *(u64 *)map_value = 123; /* BOOM */
>

So this would fail as deref of scalar.

Let me know if I missed something you were trying to point out.