Re: [xdp-hints] Re: [RFC bpf-next v2 06/14] xdp: Carry over xdp metadata into skb context

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 9, 2022 at 10:22 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
>
> On 11/9/22 3:10 AM, Toke Høiland-Jørgensen wrote:
> > Snipping a bit of context to reply to this bit:
> >
> >>>>> Can the xdp prog still change the metadata through xdp->data_meta? tbh, I am not
> >>>>> sure it is solid enough by asking the xdp prog not to use the same random number
> >>>>> in its own metadata + not to change the metadata through xdp->data_meta after
> >>>>> calling bpf_xdp_metadata_export_to_skb().
> >>>>
> >>>> What do you think the usecase here might be? Or are you suggesting we
> >>>> reject further access to data_meta after
> >>>> bpf_xdp_metadata_export_to_skb somehow?
> >>>>
> >>>> If we want to let the programs override some of this
> >>>> bpf_xdp_metadata_export_to_skb() metadata, it feels like we can add
> >>>> more kfuncs instead of exposing the layout?
> >>>>
> >>>> bpf_xdp_metadata_export_to_skb(ctx);
> >>>> bpf_xdp_metadata_export_skb_hash(ctx, 1234);
> >
> > There are several use cases for needing to access the metadata after
> > calling bpf_xdp_metdata_export_to_skb():
> >
> > - Accessing the metadata after redirect (in a cpumap or devmap program,
> >    or on a veth device)
> > - Transferring the packet+metadata to AF_XDP
> fwiw, the xdp prog could also be more selective and only stores one of the hints
> instead of the whole 'struct xdp_to_skb_metadata'.
>
> > - Returning XDP_PASS, but accessing some of the metadata first (whether
> >    to read or change it)
> >
> > The last one could be solved by calling additional kfuncs, but that
> > would be less efficient than just directly editing the struct which
> > will be cache-hot after the helper returns.
>
> Yeah, it is more efficient to directly write if possible.  I think this set
> allows the direct reading and writing already through data_meta (as a _u8 *).
>
> >
> > And yeah, this will allow the XDP program to inject arbitrary metadata
> > into the netstack; but it can already inject arbitrary *packet* data
> > into the stack, so not sure if this is much of an additional risk? If it
> > does lead to trivial crashes, we should probably harden the stack
> > against that?
> >
> > As for the random number, Jesper and I discussed replacing this with the
> > same BTF-ID scheme that he was using in his patch series. I.e., instead
> > of just putting in a random number, we insert the BTF ID of the metadata
> > struct at the end of it. This will allow us to support multiple
> > different formats in the future (not just changing the layout, but
> > having multiple simultaneous formats in the same kernel image), in case
> > we run out of space.
>
> This seems a bit hypothetical.  How much headroom does it usually have for the
> xdp prog?  Potentially the hints can use all the remaining space left after the
> header encap and the current bpf_xdp_adjust_meta() usage?
>
> >
> > We should probably also have a flag set on the xdp_frame so the stack
> > knows that the metadata area contains relevant-to-skb data, to guard
> > against an XDP program accidentally hitting the "magic number" (BTF_ID)
> > in unrelated stuff it puts into the metadata area.
>
> Yeah, I think having a flag is useful.  The flag will be set at xdp_buff and
> then transfer to the xdp_frame?
>
> >
> >> After re-reading patch 6, have another question. The 'void
> >> bpf_xdp_metadata_export_to_skb();' function signature. Should it at
> >> least return ok/err? or even return a 'struct xdp_to_skb_metadata *'
> >> pointer and the xdp prog can directly read (or even write) it?
> >
> > Hmm, I'm not sure returning a failure makes sense? Failure to read one
> > or more fields just means that those fields will not be populated? We
> > should probably have a flags field inside the metadata struct itself to
> > indicate which fields are set or not, but I'm not sure returning an
> > error value adds anything? Returning a pointer to the metadata field
> > might be convenient for users (it would just be an alias to the
> > data_meta pointer, but the verifier could know its size, so the program
> > doesn't have to bounds check it).
>
> If some hints are not available, those hints should be initialized to
> 0/CHECKSUM_NONE/...etc.  The xdp prog needs a direct way to tell hard failure
> when it cannot write the meta area because of not enough space.  Comparing
> xdp->data_meta with xdp->data as a side effect is not intuitive.
>
> It is more than saving the bound check.  With type info of 'struct
> xdp_to_skb_metadata *', the verifier can do more checks like reading in the
> middle of an integer member.  The verifier could also limit write access only to
> a few struct's members if it is needed.
>
> The returning 'struct xdp_to_skb_metadata *' should not be an alias to the
> xdp->data_meta.  They should actually point to different locations in the
> headroom.  bpf_xdp_metadata_export_to_skb() sets a flag in xdp_buff.
> xdp->data_meta won't be changed and keeps pointing to the last
> bpf_xdp_adjust_meta() location.  The kernel will know if there is
> xdp_to_skb_metadata before the xdp->data_meta when that bit is set in the
> xdp_{buff,frame}.  Would it work?
>
> >
> >> A related question, why 'struct xdp_to_skb_metadata' needs
> >> __randomize_layout?
> >
> > The __randomize_layout thing is there to force BPF programs to use CO-RE
> > to access the field. This is to avoid the struct layout accidentally
> > ossifying because people in practice rely on a particular layout, even
> > though we tell them to use CO-RE. There are lots of examples of this
> > happening in other domains (IP header options, TCP options, etc), and
> > __randomize_layout seemed like a neat trick to enforce CO-RE usage :)
>
> I am not sure if it is necessary or helpful to only enforce __randomize_layout
> in 'struct xdp_to_skb_metadata'.  There are other CO-RE use cases (tracing and
> non tracing) that already have direct access (reading and/or writing) to other
> kernel structures.
>
> It is more important for the verifier to see the xdp prog accessing it as a
> 'struct xdp_to_skb_metadata *' instead of xdp->data_meta which is a __u8 * so
> that the verifier can enforce the rules of access.
>
> >
> >>>>> Does xdp_to_skb_metadata have a use case for XDP_PASS (like patch 7) or the
> >>>>> xdp_to_skb_metadata can be limited to XDP_REDIRECT only?
> >>>>
> >>>> XDP_PASS cases where we convert xdp_buff into skb in the drivers right
> >>>> now usually have C code to manually pull out the metadata (out of hw
> >>>> desc) and put it into skb.
> >>>>
> >>>> So, currently, if we're calling bpf_xdp_metadata_export_to_skb() for
> >>>> XDP_PASS, we're doing a double amount of work:
> >>>> skb_metadata_import_from_xdp first, then custom driver code second.
> >>>>
> >>>> In theory, maybe we should completely skip drivers custom parsing when
> >>>> there is a prog with BPF_F_XDP_HAS_METADATA?
> >>>> Then both xdp->skb paths (XDP_PASS+XDP_REDIRECT) will be bpf-driven
> >>>> and won't require any mental work (plus, the drivers won't have to
> >>>> care either in the future).
> >>>>   > WDYT?
> >>>
> >>>
> >>> Yeah, not sure if it can solely depend on BPF_F_XDP_HAS_METADATA but it makes
> >>> sense to only use the hints (if ever written) from xdp prog especially if it
> >>> will eventually support xdp prog changing some of the hints in the future.  For
> >>> now, I think either way is fine since they are the same and the xdp prog is sort
> >>> of doing extra unnecessary work anyway by calling
> >>> bpf_xdp_metadata_export_to_skb() with XDP_PASS and knowing nothing can be
> >>> changed now.
> >
> > I agree it would be best if the drivers also use the XDP metadata (if
> > present) on XDP_PASS. Longer term my hope is we can make the XDP
> > metadata support the only thing drivers need to implement (i.e., have
> > the stack call into that code even when no XDP program is loaded), but
> > for now just for consistency (and allowing the XDP program to update the
> > metadata), we should probably at least consume it on XDP_PASS.
> >
> > -Toke
> >

Not to derail the discussion (left the last message intact on top,
feel free to continue), but to summarize. The proposed changes seem to
be:

1. bpf_xdp_metadata_export_to_skb() should return pointer to "struct
xdp_to_skb_metadata"
  - This should let bpf programs change the metadata passed to the skb

2. "struct xdp_to_skb_metadata" should have its btf_id as the first
__u32 member (and remove the magic)
  - This is for the redirect case where the end users, including
AF_XDP, can parse this metadata from btf_id
  - This, however, is not all the metadata that the device can
support, but a much narrower set that the kernel is expected to use
for skb construction

3. __randomize_layout isn't really helping, CO-RE will trigger
regardless; maybe only the case where it matters is probably AF_XDP,
so still useful?

4. The presence of the metadata generated by
bpf_xdp_metadata_export_to_skb should be indicated by a flag in
xdp_{buff,frame}->flags
  - Assuming exposing it via xdp_md->has_skb_metadata is ok?
  - Since the programs probably need to do the following:

  if (xdp_md->has_skb_metadata) {
    access/change skb metadata by doing struct xdp_to_skb_metadata *p
= data_meta;
  } else {
    use kfuncs
  }

5. Support the case where we keep program's metadata and kernel's
xdp_to_skb_metadata
  - skb_metadata_import_from_xdp() will "consume" it by mem-moving the
rest of the metadata over it and adjusting the headroom


I think the above solves all the cases Toke points to?

a) Accessing the metadata after redirect (in a cpumap or devmap
program, or on a veth device)
  - only a small xdp_to_skb_metadata subset will work out of the box
iff the redirecttor calls bpf_xdp_metadata_export_to_skb; for the rest
the progs will have to agree on the layout, right?

b) Transferring the packet+metadata to AF_XDP
  - here, again, the AF_XDP consumer will have to either expect
xdp_to_skb_metadata with a smaller set of skb-related metadata, or
will have to make sure the producer builds a custom layout using
kfuncs; there is also no flag to indicate whether xdp_to_skb_metadata
is there or not; the consumer will have to test btf_id at the right
offset

c) Returning XDP_PASS, but accessing some of the metadata first
(whether to read or change it)
  - can read via kfuncs, can change via
bpf_xdp_metadata_export_to_skb(); m->xyz=abc;

Anything I'm missing?




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux