Re: [LSF/MM/BPF TOPIC] XDP metadata for TX

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 27, 2023 at 6:17 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>
> Stanislav Fomichev <sdf@xxxxxxxxxx> writes:
>
> > On Thu, Feb 23, 2023 at 3:22 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
> >>
> >> Stanislav Fomichev <sdf@xxxxxxxxxx> writes:
> >>
> >> > I'd like to discuss a potential follow up for the previous "XDP RX
> >> > metadata" series [0].
> >> >
> >> > Now that we can access (a subset of) packet metadata at RX, I'd like to
> >> > explore the options where we can export some of that metadata on TX. And
> >> > also whether it might be possible to access some of the TX completion
> >> > metadata (things like TX timestamp).
> >> >
> >> > I'm currently trying to understand whether the same approach I've used
> >> > on RX could work at TX. By May I plan to have a bunch of options laid
> >> > out (currently considering XSK tx/compl programs and XDP tx/compl
> >> > programs) so we have something to discuss.
> >>
> >> I've been looking at ways of getting a TX-completion hook for the XDP
> >> queueing stuff as well. For that, I think it could work to just hook
> >> into xdp_return_frame(), but if you want to access hardware metadata
> >> it'll obviously have to be in the driver. A hook in the driver could
> >> certainly be used for the queueing return as well, though, which may
> >> help making it worth the trouble :)
> >
> > Yeah, I'd like to get to completion descriptors ideally; so nothing
> > better than a driver hook comes to mind so far :-(
> > (I'm eye-balling mlx5's mlx5e_free_xdpsq_desc AF_XDP path mostly so far).
>
> Is there any other use case for this than getting the TX timestamp? Not
> really sure what else those descriptors contain...

I don't think so; at least looking at mlx5 and bnxt (the latter
doesn't have a timestamp in the completion ring).
So yeah, not sure, maybe that should be on the side and be AF_XDP specific.
And not even involve bpf, just put the tx tstamp somewhere in umem:
setsockopt(xsk_fd, SOL_XDP, XSK_STAMP_TX_COMPLETION,
&data_relative_offset, ..);
OTOH, if it is only a timestamp now, it doesn't mean that's all we'd
have for eternity? (plus, this needs a driver "hook" for af_xdp
anyway, so why not make it generic?)

> >> > I'd like to some more input on whether applying the same idea on TX
> >> > makes sense or not and whether there are any sensible alternatives.
> >> > (IIRC, there was an attempt to do XDP on egress that went nowhere).
> >>
> >> I believe that stranded because it was deemed not feasible to cover the
> >> SKB TX path as well, which means it can't be symmetrical to the RX hook.
> >> So we ended up with the in-devmap hook instead. I'm not sure if that's
> >> made easier by multi-buf XDP, so that may be worth revisiting.
> >>
> >> For the TX metadata you don't really have to care about the skb path, I
> >> suppose, so that may not matter too much either. However, at least for
> >> the in-kernel xdp_frame the TX path is pushed from the stack anyway, so
> >> I'm not sure if it's worth having a separate hook in the driver (with
> >> all the added complexity and overhead that entails) just to set
> >> metadata? That could just as well be done on push from higher up the
> >> stack; per-driver kfuncs could still be useful for this, though.
> >>
> >> And of course something would be needed so that that BPF programs can
> >> process AF_XDP frames in the kernel before they hit the driver, but
> >> again I'm not sure that needs to be a hook in the driver.
> >
> > Care to elaborate more on "push from higher up the stack"?
>
> I'm referring to the XDP_REDIRECT path here: xdp_frames are transmitted
> by the stack calling ndo_xdp_xmit() in the driver with an array of
> frames that are immediately put on the wire (see bq_xmit_all() in
> devmap.c). So any metadata writing could be done at that point, since
> the target driver is already known; there's even already a program hook
> in there (used for in-devmap programs).
>
> > I've been thinking about mostly two cases:
> > - XDP_TX - I think this one technically doesn't need an extra hook;
> > all metadata manipulations can be done at xdp_rx? (however, not sure
> > how real that is, since the descriptors are probably not exposed over
> > there?)
>
> Well, to me XDP_REDIRECT is the most interesting one (see above). I
> think we could even drop the XDP_TX case and only do this for
> XDP_REDIRECT, since XDP_TX is basically a special-case optimisation.
> I.e., it's possible to XDP_REDIRECT back to the same device, the frames
> will just take a slight detour up through the stack; but that could also
> be a good thing if it means we'll have to do less surgery to the drivers
> to implement this for two paths.
>
> It does have the same challenge as you outlined above, though: At that
> point the TX descriptor probably doesn't exist, so the driver NDO will
> have to do something else with the data; but maybe we can solve that
> without moving the hook into the driver itself somehow?

Ah, ok, yeah, I was putting XDP_TX / XDP_REDIRECT under the same
"transmit something out of xdp_rx hook" umbrella. We can maybe come up
with a skb-like-private metadata layout (as we've discussed previously
for skb) here as well? But not sure it would solve all the problems?
I'm thinking of an af_xdp case where it wants to program something
similar to tso/encap/tunneling offload (assuming af_xdp will get 4k+
support) or a checksum offload. Exposing access to the driver tx hooks
seems like the easiest way to get there?

> > - AF_XDP TX - this one needs something deep in the driver (due to tx
> > zc) to populate the descriptors?
>
> Yeah, this one is a bit more challenging, but having a way to process
> AF_XDP frames in the kernel before they're sent out would be good in any
> case (for things like policing what packets an AF_XDP application can
> send in a cloud deployment, for instance). Would be best if we could
> consolidate the XDP_REDIRECT and AF_XDP paths, I suppose...
>
> > - anything else?
>
> Well, see above ;)
>
> >> In any case, the above is just my immediate brain dump (I've been
> >> mulling these things over for a while in relation to the queueing
> >> stuff), and I'd certainly welcome more discussion on the subject! :)
> >
> > Awesome, thanks for the dump! Will try to keep you in the loop!
>
> Great, thanks!
>
> -Toke




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux