On Tue, Mar 7, 2023 at 11:32 AM Jesper Dangaard Brouer <jbrouer@xxxxxxxxxx> wrote: > > > On 03/03/2023 08.42, Magnus Karlsson wrote: > > On Mon, 27 Feb 2023 at 21:16, Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > >> On Mon, Feb 27, 2023 at 6:17 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > >>> Stanislav Fomichev <sdf@xxxxxxxxxx> writes: > >>>> On Thu, Feb 23, 2023 at 3:22 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > >>>>> > >>>>> Stanislav Fomichev <sdf@xxxxxxxxxx> writes: > >>>>> > >>>>>> I'd like to discuss a potential follow up for the previous "XDP RX > >>>>>> metadata" series [0]. > >>>>>> > >>>>>> Now that we can access (a subset of) packet metadata at RX, I'd like to > >>>>>> explore the options where we can export some of that metadata on TX. And > >>>>>> also whether it might be possible to access some of the TX completion > >>>>>> metadata (things like TX timestamp). > >>>>>> > > IMHO it makes sense to see TX metadata as two separate operations. > > (1) Metadata written into the TX descriptor. > (2) Metadata read when processing TX completion. > > These operations happen at two different points in time. Thus likely > need different BPF hooks. Having BPF-progs running at each of these > points in time, will allow us to e.g. implement BQL (which is relevant > to XDP queuing effort). I guess for (2) the question here is: is it worth having a separate hook? Or will a simple traceepoint as Toke suggested be enough? For BQL purposes, we can still attach a prog to that tracepoint, right? > >>>>>> I'm currently trying to understand whether the same approach I've used > >>>>>> on RX could work at TX. By May I plan to have a bunch of options laid > >>>>>> out (currently considering XSK tx/compl programs and XDP tx/compl > >>>>>> programs) so we have something to discuss. > >>>>> > >>>>> I've been looking at ways of getting a TX-completion hook for the XDP > >>>>> queueing stuff as well. For that, I think it could work to just hook > >>>>> into xdp_return_frame(), but if you want to access hardware metadata > >>>>> it'll obviously have to be in the driver. A hook in the driver could > >>>>> certainly be used for the queueing return as well, though, which may > >>>>> help making it worth the trouble :) > >>>> > >>>> Yeah, I'd like to get to completion descriptors ideally; so nothing > >>>> better than a driver hook comes to mind so far :-( > > As Toke mentions, I'm also hoping we could leverage or extend the > xdp_return_frame() call. Or implicitly add the "hook" at the existing > xdp_return_frame() call. This is about operation (2) *reading* some > metadata at TX completion time. Ack, noted, thx. Although, at least for mlx5e_free_xdpsq_desc, I don't see it being called for the af_xdp tx path. But maybe that's something we can amend in a couple of places (so xdp_return_frame would handle most xdp cases, and some new tbd func for af_xdp tx)? > Can this be mapped to the RX-kfuncs approach(?), by driver extending > (call/structs) with pointer to TX-desc + adaptor info and BPF-prog/hook > doing TX-kfuncs calls into driver (that knows how to extract completion > data). Yeah, that seems like a natural thing to do here. > > [...] > >>> Well, to me XDP_REDIRECT is the most interesting one (see above). I > >>> think we could even drop the XDP_TX case and only do this for > >>> XDP_REDIRECT, since XDP_TX is basically a special-case optimisation. > >>> I.e., it's possible to XDP_REDIRECT back to the same device, the frames > >>> will just take a slight detour up through the stack; but that could also > >>> be a good thing if it means we'll have to do less surgery to the drivers > >>> to implement this for two paths. > >>> > >>> It does have the same challenge as you outlined above, though: At that > >>> point the TX descriptor probably doesn't exist, so the driver NDO will > >>> have to do something else with the data; but maybe we can solve that > >>> without moving the hook into the driver itself somehow? > >> > >> Ah, ok, yeah, I was putting XDP_TX / XDP_REDIRECT under the same > >> "transmit something out of xdp_rx hook" umbrella. We can maybe come up > >> with a skb-like-private metadata layout (as we've discussed previously > >> for skb) here as well? But not sure it would solve all the problems? > > This is operation (1) writing metadata into the TX descriptor. > In this case we have a metadata mapping problem, from RX on one device > driver to TX on another device driver. As you say, we also need to map > this SKBs, which have a fairly static set of metadata. > > For the most common metadata offloads (like TX-checksum, TX-vlan) I > think it makes sense to store those in xdp_frame area (use for SKB > mapping) and re-use these when at TX writing into the TX descriptor. [..] > BUT there are also metadata TX offloads offloads, like asking for a > specific Launch-Time for at packet, that needs a more flexible approach. Why can't these go into the same "common" xdp_frame area? > >> I'm thinking of an af_xdp case where it wants to program something > >> similar to tso/encap/tunneling offload (assuming af_xdp will get 4k+ > >> support) or a checksum offload. Exposing access to the driver tx hooks > >> seems like the easiest way to get there? > >> > >>>> - AF_XDP TX - this one needs something deep in the driver (due to tx > >>>> zc) to populate the descriptors? > >>> > >>> Yeah, this one is a bit more challenging, but having a way to process > >>> AF_XDP frames in the kernel before they're sent out would be good in any > >>> case (for things like policing what packets an AF_XDP application can > >>> send in a cloud deployment, for instance). Would be best if we could > >>> consolidate the XDP_REDIRECT and AF_XDP paths, I suppose... > >>> > > I agree, it would be best if we can consolidate the XDP_REDIRECT and > AF_XDP paths, else we have to re-implement the same for AF_XDP xmit path > (and maintain both paths). I also agree that being able to police what > packets an AF_XDP application can send is a useful feature (e.g. cloud > deployments). > > Looking forward to collaborate on this work! > --Jesper Thank you for the comments! So it looks like two things we potentially need to do/agree upon: 1. User-facing API. One hook + tracepoint vs two hooks (and at what level: af_xdp vs xdp). I'll try to focus on that first (waiting for af_xdp patches that Magnus mentioned). 2. Potentially internal refactoring to consolidate XDP_REDIRECT+AF_XDP (seems like something we should be able to discuss as we go; aka implementation details)