On Mon, Jan 16, 2023 at 8:21 AM Jesper Dangaard Brouer <jbrouer@xxxxxxxxxx> wrote: > > > > On 12/01/2023 01.32, Stanislav Fomichev wrote: > > The goal is to enable end-to-end testing of the metadata for AF_XDP. > > For me the goal with veth goes beyond *testing*. > > This patch ignores the xdp_frame case. I'm not blocking this patch, but > I'm saying we need to make sure there is a way forward for accessing > XDP-hints when handling redirected xdp_frame's. Sure, let's work towards getting that other part addressed! > I have two use-cases we should cover (as future work). > > (#1) We have customers that want to redirect from physical NIC hardware > into containers, and then have the veth XDP-prog (selectively) redirect > into an AF_XDP socket (when matching fastpath packets). Here they > (minimum) want access to the XDP hint info on HW checksum. > > (#2) Both veth and cpumap can create SKBs based on xdp_frame's. Here it > is essential to get HW checksum and HW hash when creating these SKBs > (else netstack have to do expensive csum calc and parsing in > flow-dissector). >From my PoW, I'd probably have to look into the TX side first (tx timestamp) before looking into xdp->skb path. So if somebody on your side has cycles, feel free to drive this effort. I'm happy to provide reviews/comments/etc. I think we've discussed in the past that this will most likely look like another set of "export" kfuncs? We can start with extending new Documentation/networking/xdp-rx-metadata.rst with a high-level design. > > Cc: John Fastabend <john.fastabend@xxxxxxxxx> > > Cc: David Ahern <dsahern@xxxxxxxxx> > > Cc: Martin KaFai Lau <martin.lau@xxxxxxxxx> > > Cc: Jakub Kicinski <kuba@xxxxxxxxxx> > > Cc: Willem de Bruijn <willemb@xxxxxxxxxx> > > Cc: Jesper Dangaard Brouer <brouer@xxxxxxxxxx> > > Cc: Anatoly Burakov <anatoly.burakov@xxxxxxxxx> > > Cc: Alexander Lobakin <alexandr.lobakin@xxxxxxxxx> > > Cc: Magnus Karlsson <magnus.karlsson@xxxxxxxxx> > > Cc: Maryam Tahhan <mtahhan@xxxxxxxxxx> > > Cc: xdp-hints@xxxxxxxxxxxxxxx > > Cc: netdev@xxxxxxxxxxxxxxx > > Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxx> > > --- > > drivers/net/veth.c | 31 +++++++++++++++++++++++++++++++ > > 1 file changed, 31 insertions(+) > > > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > > index 70f50602287a..ba3e05832843 100644 > > --- a/drivers/net/veth.c > > +++ b/drivers/net/veth.c > > @@ -118,6 +118,7 @@ static struct { > > > > struct veth_xdp_buff { > > struct xdp_buff xdp; > > + struct sk_buff *skb; > > }; > > > > static int veth_get_link_ksettings(struct net_device *dev, > > @@ -602,6 +603,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq, > > > > xdp_convert_frame_to_buff(frame, xdp); > > xdp->rxq = &rq->xdp_rxq; > > + vxbuf.skb = NULL; > > > > act = bpf_prog_run_xdp(xdp_prog, xdp); > > > > @@ -823,6 +825,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq, > > __skb_push(skb, skb->data - skb_mac_header(skb)); > > if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb)) > > goto drop; > > + vxbuf.skb = skb; > > > > orig_data = xdp->data; > > orig_data_end = xdp->data_end; > > @@ -1602,6 +1605,28 @@ static int veth_xdp(struct net_device *dev, struct netdev_bpf *xdp) > > } > > } > > > > +static int veth_xdp_rx_timestamp(const struct xdp_md *ctx, u64 *timestamp) > > +{ > > + struct veth_xdp_buff *_ctx = (void *)ctx; > > + > > + if (!_ctx->skb) > > + return -EOPNOTSUPP; > > + > > + *timestamp = skb_hwtstamps(_ctx->skb)->hwtstamp; > > The SKB stores this skb_hwtstamps() in skb_shared_info memory area. > This memory area is actually also available to xdp_frames. Thus, we > could store the HW rx_timestamp in same location for redirected > xdp_frames. This could make code path sharing possible between SKB vs > xdp_frame in veth. > > This would also make it fast to "transfer" HW rx_timestamp when creating > an SKB from an xdp_frame, as data is already written in the correct place. > > Performance wise the down-side is that skb_shared_info memory area is in > a separate cacheline. Thus, when no HW rx_timestamp is available, then > it is very expensive for a veth XDP bpf-prog to access this, just to get > a zero back. Having an xdp_frame->flags bit that knows if HW > rx_timestamp have been stored, can mitigate this. That's one way to do it; although I'm not sure about the cases which don't use xdp_frame and use stack-allocated xdp_buff. > > + return 0; > > +} > > + > > +static int veth_xdp_rx_hash(const struct xdp_md *ctx, u32 *hash) > > +{ > > + struct veth_xdp_buff *_ctx = (void *)ctx; > > + > > + if (!_ctx->skb) > > + return -EOPNOTSUPP; > > For xdp_frame case, I'm considering simply storing the u32 RX-hash in > struct xdp_frame. This makes it easy to extract for xdp_frame to SKB > create use-case. > > As have been mentioned before, the SKB also requires knowing the RSS > hash-type. This HW hash-type actually contains a lot of information, > that today is lost when reduced to the SKB hash-type. Due to > standardization from Microsoft, most HW provide info on (L3) IPv4 or > IPv6, and on (L4) TCP or UDP (and often SCTP). Often hardware > descriptor also provide info on the header length. Future work in this > area is exciting as we can speedup parsing of packets in XDP, if we can > get are more detailed HW info on hash "packet-type". Something like the version we've discussed a while back [0]? Seems workable overall if we remove it from the UAPI? (not everyone was happy about UAPI parts IIRC) 0: https://lore.kernel.org/bpf/20221115030210.3159213-7-sdf@xxxxxxxxxx/ > > + > > + *hash = skb_get_hash(_ctx->skb); > + return 0; > > +} > > + > > --Jesper >