Re: [PATCH bpf-next v4 2/3] bpf: Add xdp dynptrs

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Fri, 26 Aug 2022 20:31:48 +0200

On Fri, 26 Aug 2022 at 20:23, Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
>
> On Thu, Aug 25, 2022 at 4:19 PM Kumar Kartikeya Dwivedi
> <memxor@xxxxxxxxx> wrote:
> >
> > On Thu, 25 Aug 2022 at 22:39, Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> > >
> > > On Wed, Aug 24, 2022 at 2:11 PM Kumar Kartikeya Dwivedi
> > > <memxor@xxxxxxxxx> wrote:
> > > >
> > > > On Wed, 24 Aug 2022 at 00:27, Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> > > > >
> > > > > On Mon, Aug 22, 2022 at 7:31 PM Kumar Kartikeya Dwivedi
> > > > > <memxor@xxxxxxxxx> wrote:
> > > > > > > [...]
> > > > > > >                 if (func_id == BPF_FUNC_dynptr_data &&
> > > > > > > -                   dynptr_type == BPF_DYNPTR_TYPE_SKB) {
> > > > > > > +                   (dynptr_type == BPF_DYNPTR_TYPE_SKB ||
> > > > > > > +                    dynptr_type == BPF_DYNPTR_TYPE_XDP)) {
> > > > > > >                         regs[BPF_REG_0].type = PTR_TO_PACKET | ret_flag;
> > > > > > >                         regs[BPF_REG_0].range = meta.mem_size;
> > > > > >
> > > > > > It doesn't seem like this is safe. Since PTR_TO_PACKET's range can be
> > > > > > modified by comparisons with packet pointers loaded from the xdp/skb
> > > > > > ctx, how do we distinguish e.g. between a pkt slice obtained from some
> > > > > > frag in a multi-buff XDP vs pkt pointer from a linear area?
> > > > > >
> > > > > > Someone can compare data_meta from ctx with PTR_TO_PACKET from
> > > > > > bpf_dynptr_data on xdp dynptr (which might be pointing to a xdp mb
> > > > > > frag). While MAX_PACKET_OFF is 0xffff, it can still be used to do OOB
> > > > > > access for the linear area. reg_is_init_pkt_pointer will return true
> > > > > > as modified range is not considered for it. Same kind of issues when
> > > > > > doing comparison with data_end from ctx (though maybe you won't be
> > > > > > able to do incorrect data access at runtime using that).
> > > > > >
> > > > > > I had a pkt_uid field in my patch [0] which disallowed comparisons
> > > > > > among bpf_packet_pointer slices. Each call assigned a fresh pkt_uid,
> > > > > > and that disabled comparisons for them. reg->id is used for var_off
> > > > > > range propagation so it cannot be reused.
> > > > > >
> > > > > > Coming back to this: What we really want here is a PTR_TO_MEM with a
> > > > > > mem_size, so maybe you should go that route instead of PTR_TO_PACKET
> > > > > > (and add a type tag to maybe pretty print it also as a packet pointer
> > > > > > in verifier log), or add some way to distinguish slice vs non-slice
> > > > > > pkt pointers like I did in my patch. You might also want to add some
> > > > > > tests for this corner case (there are some later in [0] if you want to
> > > > > > reuse them).
> > > > > >
> > > > > > So TBH, I kinda dislike my own solution in [0] :). The complexity does
> > > > > > not seem worth it. The pkt_uid distinction is more useful (and
> > > > > > actually would be needed) in Toke's xdp queueing series, where in a
> > > > > > dequeue program you have multiple xdp_mds and want scoped slice
> > > > > > invalidations (i.e. adjust_head on one xdp_md doesn't invalidate
> > > > > > slices of some other xdp_md). Here we can just get away with normal
> > > > > > PTR_TO_MEM.
> > > > > >
> > > > > > ... Or just let me know if you handle this correctly already, or if
> > > > > > this won't be an actual problem :).
> > > > >
> > > > > Ooh interesting, I hadn't previously taken a look at
> > > > > try_match_pkt_pointers(), thanks for mentioning it :)
> > > > >
> > > > > The cleanest solution to me is to add the flag "DYNPTR_TYPE_{SKB/XDP}"
> > > > > to PTR_TO_PACKET and change reg_is_init_pkt_pointer() to return false
> > > > > if the DYNPTR_TYPE_{SKB/XDP} flag is present. I prefer this over
> > > > > returning PTR_TO_MEM because it seems more robust (eg if in the future
> > > > > we reject x behavior on the packet data reg types, this will
> > > > > automatically apply to the data slices), and because it'll keep the
> > > > > logic more efficient/simpler for the case when the pkt pointer has to
> > > > > be cleared after any helper that changes pkt data is called (aka the
> > > > > case where the data slice gets invalidated).
> > > > >
> > > > > What are your thoughts?
> > > > >
> > > >
> > > > Thinking more deeply about it, probably not, we need more work here. I
> > > > remember _now_ why I chose the pkt_uid approach (and this tells us my
> > > > commit log lacks all the details about the motivation :( ).
> > > >
> > > > Consider how equivalency checking for packet pointers works in
> > > > regsafe. It is checking type, then if old range > cur range, then
> > > > offs, etc.
> > > >
> > > > The problem is, while we now don't prune on access to ptr_to_pkt vs
> > > > ptr_to_pkt | dynptr_pkt types in same reg (since type differs we
> > > > return false), we still prune if old range of ptr_to_pkt | dynptr_pkt
> > > > > cur range of ptr_to_pkt | dynptr_pkt. Both could be pointing into
> > > > separate frags, so this assumption would be incorrect. I would be able
> > > > to trick the verifier into accessing data beyond the length of a
> > > > different frag, by first making sure one line of exploration is
> > > > verified, and then changing the register in another branch reaching
> > > > the same branch target. Helpers can take packet pointers so the access
> > > > can become a pruning point. It would think the rest of the stuff is
> > > > safe, while they are not equivalent at all. It is ok if they are bit
> > > > by bit equivalent (same type, range, off, etc.).
> > >
> > > Thanks for the explanation. To clarify, if old range of ptr_to_pkt >
> > > cur range of ptr_to_pkt, what gets pruned? Is it access to cur range
> > > of ptr_to_pkt since if old range > cur range, then if old range is
> > > acceptable cur range must definitely be acceptable?
> >
> > No, my description was bad :).
> > We return false when old_range > cur_range, i.e. the path is
> > considered safe and not explored again when old_range <= cur_range
> > (pruning), otherwise we continue verifying.
> > Consider if it was doing pkt[cur_range + 1] access in the path we are
> > about to explore again (already verified for old_range). That is <=
> > old_range, but > cur_range, so it would be problematic if we pruned
> > search for old_range > cur_range.
>
> Does "old_range" here refer to the range that was already previously
> verified as safe by the verifier? And "cur_range" is the new range
> that we are trying to figure out is safe or not?

Yes, it's the range of the same reg in the already safe verified state
from the point we are about to explore again (and considering
'pruning' the search if the current state would satisfy the safety
properties as well).

>
> When you say "we return false when old_range > cur_range", what
> function are we returning false from?
>

We return false from the regsafe function, i.e. it isn't safe if the
old range was greater than the current range, otherwise we match other
stuff before we return true (like off being equal, and var_off of old
being within cur's).

> >
> > So maybe it won't be a problem here, and just the current range checks
> > for pkt pointer slices is fine even if they belong to different frags.
> > I didn't craft any test case when writing my previous reply.
> > Especially since you will disable comparisons, one cannot relearn
> > range again using var_off + comparison, which closes another loophole.
> >
> > It just seems simpler to me to be a bit more conservative, since it is
> > only an optimization. There might be some corner case lurking we can't
> > think of right now. But I leave the judgement up to you if you can
> > reason about it. In either case it would be good to include some
> > comments in the commit log about all this.
> >
> > Meanwhile, looking at the current code, I'm more inclined to suggest
> > PTR_TO_MEM (and handle invalidation specially), but again, I will
> > leave it up to you to decide.
> >
> > When we do += var_off to a pkt reg, its range is reset to zero,
> > compared to PTR_TO_MEM, where off + var_off (smin/umax) is used to
> > check against the actual size for an access, which is a bit more
> > flexible. The reason to reset range is that it will be relearned using
> > comparisons and transferred to copies (reg->id is assigned for each +=
>
> Can you direct me to the function where this relearning happens? thanks!
>

See the code around the comment:
/* something was added to pkt_ptr, set range to zero */
where it memsets the range to 0,
then in adjust_ptr_min_max_vals,
and then you see how find_good_pkt_pointers also takes into account
reg->umax_value + off before setting reg->range after your next
comparison (because at runtime the variable offset is already added to
the pointer).

> > var_off), which doesn't apply to slice pointers (essentially the only
> > reason to keep them is being able to pick them for invalidation), we
> > try to disable the rest of the pkt pointer magic in the verifier,
> > anyway.
> >
> > pkt_reg->umax_value influences the prog->aux->max_pkt_offset (and iiuc
> > it can reach that point with range == 0 after += var_off, and
> > zero_size_allowed == true), only seems to be used by netronome's ebpf
> > offload for now, but still a bit confusing if slice pkt pointers cause
> > this to change.
>
> My major takeaway from this discussion is that there's a lot of extra
> subtleties when reg is PTR_TO_PACKET :) I'm going to delve deeper into
> the source code, but from a glance, I think you're right that just
> assigning PTR_TO_MEM for the data slice will probably make things a
> lot more straightforward. thanks for the discussion!

Exactly. I think this is an internal verifier detail so we can
definitely go back and change this later, but IMO we'd avoid a lot of
headache if we just choose PTR_TO_MEM, since PTR_TO_PACKET has a lot
of special semantics.

>
> >
> > >
> > > >
> > > > If you start returning false whenever you see this type tag set, it
> > > > will become too conservative (it considers reg copies of the same
> > > > dynptr_data lookup as distinct). So you need some kind of id assigned
> > > > during dynptr_data lookup to distinguish them.
> > >
> > > What about if the dynptr_pkt type tag is set, then we compare the
> > > ranges as well? If the ranges are the same, then we return true, else
> > > false. Does that work?
> >
> > Seems like it, and true part is already covered by memcmp at the start
> > of the function, I think.