Re: [RFC PATCH bpf-next v1 1/2] bpf: Explore PTR_TO_STACK as R0 for bpf_dynptr_slice_rdwr

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 21 Feb 2025 09:15:32 -0800

On Thu, Feb 20, 2025 at 5:13 PM Kumar Kartikeya Dwivedi
<memxor@xxxxxxxxx> wrote:
>
> On Fri, 21 Feb 2025 at 02:02, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Thu, Feb 20, 2025 at 4:37 PM Kumar Kartikeya Dwivedi
> > <memxor@xxxxxxxxx> wrote:
> > >
> > > On Fri, 21 Feb 2025 at 01:27, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
> > > >
> > > > On Thu, Feb 20, 2025 at 7:41 AM Kumar Kartikeya Dwivedi
> > > > <memxor@xxxxxxxxx> wrote:
> > > > >
> > > > > On Thu, 20 Feb 2025 at 01:13, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
> > > > > >
> > > > > > On Wed, Feb 19, 2025 at 10:10 AM Kumar Kartikeya Dwivedi
> > > > > > <memxor@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Wed, 19 Feb 2025 at 18:41, Alexei Starovoitov
> > > > > > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > On Wed, Feb 19, 2025 at 4:51 AM Kumar Kartikeya Dwivedi
> > > > > > > > <memxor@xxxxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > > For the bpf_dynptr_slice_rdwr kfunc, the verifier may return a pointer
> > > > > > > > > to the underlying packet (if the requested slice is linear), or copy out
> > > > > > > > > the data to the buffer passed into the kfunc. The verifier performs
> > > > > > > > > symbolic execution assuming the returned value is a PTR_TO_MEM of a
> > > > > > > > > certain size (passed into the kfunc), and ensures reads and writes are
> > > > > > > > > within bounds.
> > > > > > > >
> > > > > > > > sounds like
> > > > > > > > check_kfunc_mem_size_reg() -> check_mem_size_reg() ->
> > > > > > > > check_helper_mem_access()
> > > > > > > >    case PTR_TO_STACK:
> > > > > > > >       check_stack_range_initialized()
> > > > > > > >          clobber = true
> > > > > > > >              if (clobber) {
> > > > > > > >                   __mark_reg_unknown(env, &state->stack[spi].spilled_ptr);
> > > > > > > >
> > > > > > > > is somehow broken?
> > > > > > > >
> > > > > > > > ohh. It might be:
> > > > > > > > || !is_kfunc_arg_optional(meta->btf, buff_arg)
> > > > > > > >
> > > > > > > > This bit is wrong then.
> > > > > > > > When arg is not-null check_kfunc_mem_size_reg() should be called.
> > > > > > > > The PTR_TO_STACK abuse is a small subset of issues
> > > > > > > > if check_kfunc_mem_size_reg() is indeed not called.
> > > > > > >
> > > > > > > The condition looks ok to me.
> > > > > > >
> > > > > > > The condition to do check_mem_size_reg is !null || !opt.
> > > > > > > So when it's null, and it's opt, it will be skipped.
> > > > > > > When it's null, and it's not opt, the check will happen.
> > > > > > > When arg is not-null, the said function is called, opt does not matter then.
> > > > > > > So the stack slots are marked misc.
> > > > > > >
> > > > > > > In our case we're not passing a NULL pointer in the selftest.
> > > > > > >
> > > > > > > The problem occurs once we spill to that slot _after_ the call, and
> > > > > > > then do a write through returned mem pointer.
> > > > > > >
> > > > > > > The final few lines from the selftest do the dirty thing, where r0 is
> > > > > > > aliasing fp-8, and r1 = 0.
> > > > > > >
> > > > > > > + *(u64 *)(r10 - 8) = r8; \
> > > > > > > + *(u64 *)(r0 + 0) = r1; \
> > > > > > > + r8 = *(u64 *)(r10 - 8); \
> > > > > > > + r0 = *(u64 *)(r8 + 0); \
> > > > > > >
> > > > > > > The write through r0 must re-mark the stack, but the verifier doesn't
> > > > > > > know it's pointing to the stack.
> > > > > > > push_stack was the conceptually cleaner/simpler fix, but it apparently
> > > > > > > isn't good enough.
> > > > > > >
> > > > > > > Remarking the stack on write to PTR_TO_MEM, or invalidating PTR_TO_MEM
> > > > > > > when r8 is spilled to fp-8 first time after the call are two options.
> > > > > > > Both have some concerns (first, the misaligned stack access is not
> > > > > > > caught, second PTR_TO_MEM may outlive stack frame).
> > > > > >
> > > > > > Reading the description of the problem my first instinct was to have
> > > > > > stack slots with associated obj_ref_id for such cases and when
> > > > > > something writes into that stack slot, invalidate everything with that
> > > > > > obj_ref_id. So that's probably what you mean by invalidating
> > > > > > PTR_TO_MEM, right?
> > > > > >
> > > > > > Not sure I understand what "PTR_TO_STACK with mem_size" (that Alexei
> > > > > > mentioned in another email) means, though, so hard to compare.
> > > > > >
> > > > >
> > > > > Invalidation is certainly one option. The one Alexei mentioned was
> > > > > where we discussed bounding how much data can be read through the
> > > > > PTR_TO_STACK (similar to PTR_TO_MEM), and mark r0 as PTR_TO_STACK.
> > > > > This ends up combining the constraints of both types of pointers (it
> > > > > may as well be called PTR_TO_STACK_OR_MEM) without forking paths.
> > > >
> > > > Yeah, PTR_TO_STACK_OR_MEM would be more precise. But how does that
> > > > help with this problem? Not sure I follow the idea of the solution
> > > > (but I can wait for patches to be posted).
> > >
> > > The reason for push_stack was to ensure writes through the returned
> > > pointer take effect on stack state.
> > > By keeping it PTR_TO_STACK, we get that behavior.
> > > However, in the other path of this patch, the verifier would verify
> > > the pointer as PTR_TO_MEM, with a bounded mem_size.
> > > Thus it would not allow writing beyond a certain size.
> > > If we simply set r0 to PTR_TO_STACK now, it would possibly allow going
> > > beyond the size that was passed to kfunc.
> > > E.g. say buffer was fp-24 and size was 8, we can also modify fp-8
> > > through it and not just fp-16.
> > > Adding an additional mem_size field and checking it (when set) for
> > > PTR_TO_STACK allows us to ensure we cannot write beyond 8 bytes
> > > through r0=fp-24.
> >
> > yeah, this part is clear, of course, but I was actually more worried
> > about handling the following situation:
> >
>
> For simplicity, let's just forget about stack_or_mem, and assume stack
> pointer with this extra mem_size.
> I haven't written the code yet but the idea would be to limit writing
> to [fp-N, fp-N+mem_size) through it.
> A screening check before we pass through to the normal stack access
> handling code.
>
> >
> > struct bpf_dynptr dptr;
> > void *slice;
> > union {
> >     char buf[32];
> >     void *map_value;
> > } blah;
> >
> > bpf_dynptr_from_skb(..., &dptr);
>
> At this point we mark dptr.
>
> >
> > slice = bpf_dynptr_slice_rdwr(&dptr, 0, &blah.buf, sizeof(blah.buf));
> > if (!slice) return 0; /* whatever */
> >
>
> slice is PTR_TO_STACK w/ mem_size = 32.
>
> We mark the bytes for buf as STACK_MISC at this point.
>
> > /* now slice points to blah.{buf,map_value}, right? */
> >
> > map_value = bpf_map_lookup_elem(&some_map, ...);
> > if (!map_value) return 0;
>
> The 8-byte slot for map_value in these 32-bytes becomes map_value.
> Rest remains misc.
>
> >
> > /* we shouldn't allow working with slice at this point */
> > *(u64 *)slice = 0xdeadbeef; /* overwrite map_value */
>
> And because slice is PTR_TO_STACK, we will notice the overwrite of map
> value with scalar.
>
> >
> > *(u64 *)map_value = 123; /* BOOM */
> >
>
> So this would fail as deref of scalar.
>
> Let me know if I missed something you were trying to point out.

Thanks for elaborating, I got the idea now. But I'm still unsure why
we need to invent this new mechanism instead of building upon the
existing one we already have. Note that when someone is overwriting
dynptr slots on the stack, we automatically invalidate all the
pointers derived from that dynptr. And also note that by itself it's
not even an invalid thing to use. Once code is done using dynptr, it's
ok to reuse that stack for something else.

So I was thinking the same thing here. That buffer on the stack passed
to bpf_dynptr_slice would be marked as DYNPTR_BUF or something like
that, and whenever some instruction overwrites any part of those stack
slots, we just invalidate those pointers.

This new PTR_TO_STACK aliasing for bpf_dynptr_slice() approach seems a
bit more error-prone, IMO (especially that returned pointer in reality
might not, and most often will not, point to the stack memory
actually; so it's a bit of mental exercise to keep this discrepancy in
one's head). Plus it might regress states convergence, because any
write through that pointer is now tracking some state on the stack,
even though we just read/write raw unknown bytes "somewhere in
memory".

Anyways, probably nothing broken here, but just a new concept that
feels unnecessary.