Re: [LSF/MM/BPF TOPIC] XDP metadata for TX

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 27 Feb 2023 at 21:16, Stanislav Fomichev <sdf@xxxxxxxxxx> wrote:
>
> On Mon, Feb 27, 2023 at 6:17 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
> >
> > Stanislav Fomichev <sdf@xxxxxxxxxx> writes:
> >
> > > On Thu, Feb 23, 2023 at 3:22 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
> > >>
> > >> Stanislav Fomichev <sdf@xxxxxxxxxx> writes:
> > >>
> > >> > I'd like to discuss a potential follow up for the previous "XDP RX
> > >> > metadata" series [0].
> > >> >
> > >> > Now that we can access (a subset of) packet metadata at RX, I'd like to
> > >> > explore the options where we can export some of that metadata on TX. And
> > >> > also whether it might be possible to access some of the TX completion
> > >> > metadata (things like TX timestamp).
> > >> >
> > >> > I'm currently trying to understand whether the same approach I've used
> > >> > on RX could work at TX. By May I plan to have a bunch of options laid
> > >> > out (currently considering XSK tx/compl programs and XDP tx/compl
> > >> > programs) so we have something to discuss.
> > >>
> > >> I've been looking at ways of getting a TX-completion hook for the XDP
> > >> queueing stuff as well. For that, I think it could work to just hook
> > >> into xdp_return_frame(), but if you want to access hardware metadata
> > >> it'll obviously have to be in the driver. A hook in the driver could
> > >> certainly be used for the queueing return as well, though, which may
> > >> help making it worth the trouble :)
> > >
> > > Yeah, I'd like to get to completion descriptors ideally; so nothing
> > > better than a driver hook comes to mind so far :-(
> > > (I'm eye-balling mlx5's mlx5e_free_xdpsq_desc AF_XDP path mostly so far).
> >
> > Is there any other use case for this than getting the TX timestamp? Not
> > really sure what else those descriptors contain...
>
> I don't think so; at least looking at mlx5 and bnxt (the latter
> doesn't have a timestamp in the completion ring).
> So yeah, not sure, maybe that should be on the side and be AF_XDP specific.
> And not even involve bpf, just put the tx tstamp somewhere in umem:
> setsockopt(xsk_fd, SOL_XDP, XSK_STAMP_TX_COMPLETION,
> &data_relative_offset, ..);
> OTOH, if it is only a timestamp now, it doesn't mean that's all we'd
> have for eternity? (plus, this needs a driver "hook" for af_xdp
> anyway, so why not make it generic?)
>
> > >> > I'd like to some more input on whether applying the same idea on TX
> > >> > makes sense or not and whether there are any sensible alternatives.
> > >> > (IIRC, there was an attempt to do XDP on egress that went nowhere).
> > >>
> > >> I believe that stranded because it was deemed not feasible to cover the
> > >> SKB TX path as well, which means it can't be symmetrical to the RX hook.
> > >> So we ended up with the in-devmap hook instead. I'm not sure if that's
> > >> made easier by multi-buf XDP, so that may be worth revisiting.
> > >>
> > >> For the TX metadata you don't really have to care about the skb path, I
> > >> suppose, so that may not matter too much either. However, at least for
> > >> the in-kernel xdp_frame the TX path is pushed from the stack anyway, so
> > >> I'm not sure if it's worth having a separate hook in the driver (with
> > >> all the added complexity and overhead that entails) just to set
> > >> metadata? That could just as well be done on push from higher up the
> > >> stack; per-driver kfuncs could still be useful for this, though.
> > >>
> > >> And of course something would be needed so that that BPF programs can
> > >> process AF_XDP frames in the kernel before they hit the driver, but
> > >> again I'm not sure that needs to be a hook in the driver.
> > >
> > > Care to elaborate more on "push from higher up the stack"?
> >
> > I'm referring to the XDP_REDIRECT path here: xdp_frames are transmitted
> > by the stack calling ndo_xdp_xmit() in the driver with an array of
> > frames that are immediately put on the wire (see bq_xmit_all() in
> > devmap.c). So any metadata writing could be done at that point, since
> > the target driver is already known; there's even already a program hook
> > in there (used for in-devmap programs).
> >
> > > I've been thinking about mostly two cases:
> > > - XDP_TX - I think this one technically doesn't need an extra hook;
> > > all metadata manipulations can be done at xdp_rx? (however, not sure
> > > how real that is, since the descriptors are probably not exposed over
> > > there?)
> >
> > Well, to me XDP_REDIRECT is the most interesting one (see above). I
> > think we could even drop the XDP_TX case and only do this for
> > XDP_REDIRECT, since XDP_TX is basically a special-case optimisation.
> > I.e., it's possible to XDP_REDIRECT back to the same device, the frames
> > will just take a slight detour up through the stack; but that could also
> > be a good thing if it means we'll have to do less surgery to the drivers
> > to implement this for two paths.
> >
> > It does have the same challenge as you outlined above, though: At that
> > point the TX descriptor probably doesn't exist, so the driver NDO will
> > have to do something else with the data; but maybe we can solve that
> > without moving the hook into the driver itself somehow?
>
> Ah, ok, yeah, I was putting XDP_TX / XDP_REDIRECT under the same
> "transmit something out of xdp_rx hook" umbrella. We can maybe come up
> with a skb-like-private metadata layout (as we've discussed previously
> for skb) here as well? But not sure it would solve all the problems?
> I'm thinking of an af_xdp case where it wants to program something
> similar to tso/encap/tunneling offload (assuming af_xdp will get 4k+
> support)

We have a patch set of this in the works. Need to finish the last
couple of tests then optimize performance and it is good to go. We
should be able to post it during the next cycle that starts next week.

Having tso/encap/tunneling and checksum offload accessible in the
AF_XDP Tx path would indeed be very useful.

> or a checksum offload. Exposing access to the driver tx hooks
> seems like the easiest way to get there?
>
> > > - AF_XDP TX - this one needs something deep in the driver (due to tx
> > > zc) to populate the descriptors?
> >
> > Yeah, this one is a bit more challenging, but having a way to process
> > AF_XDP frames in the kernel before they're sent out would be good in any
> > case (for things like policing what packets an AF_XDP application can
> > send in a cloud deployment, for instance). Would be best if we could
> > consolidate the XDP_REDIRECT and AF_XDP paths, I suppose...
> >
> > > - anything else?
> >
> > Well, see above ;)
> >
> > >> In any case, the above is just my immediate brain dump (I've been
> > >> mulling these things over for a while in relation to the queueing
> > >> stuff), and I'd certainly welcome more discussion on the subject! :)
> > >
> > > Awesome, thanks for the dump! Will try to keep you in the loop!
> >
> > Great, thanks!
> >
> > -Toke




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux