Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> writes: > From: Toke Høiland-Jørgensen <toke@xxxxxxxxxx> > Date: Fri, 09 Aug 2024 14:45:33 +0200 > >> Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> writes: >> >>> From: Daniel Xu <dxu@xxxxxxxxx> >>> Date: Thu, 08 Aug 2024 16:52:51 -0400 >>> >>>> Hi, >>>> >>>> On Thu, Aug 8, 2024, at 7:57 AM, Alexander Lobakin wrote: >>>>> From: Lorenzo Bianconi <lorenzo.bianconi@xxxxxxxxxx> >>>>> Date: Thu, 8 Aug 2024 06:54:06 +0200 >>>>> >>>>>>> Hi Alexander, >>>>>>> >>>>>>> On Tue, Jun 28, 2022, at 12:47 PM, Alexander Lobakin wrote: >>>>>>>> cpumap has its own BH context based on kthread. It has a sane batch >>>>>>>> size of 8 frames per one cycle. >>>>>>>> GRO can be used on its own, adjust cpumap calls to the >>>>>>>> upper stack to use GRO API instead of netif_receive_skb_list() which >>>>>>>> processes skbs by batches, but doesn't involve GRO layer at all. >>>>>>>> It is most beneficial when a NIC which frame come from is XDP >>>>>>>> generic metadata-enabled, but in plenty of tests GRO performs better >>>>>>>> than listed receiving even given that it has to calculate full frame >>>>>>>> checksums on CPU. >>>>>>>> As GRO passes the skbs to the upper stack in the batches of >>>>>>>> @gro_normal_batch, i.e. 8 by default, and @skb->dev point to the >>>>>>>> device where the frame comes from, it is enough to disable GRO >>>>>>>> netdev feature on it to completely restore the original behaviour: >>>>>>>> untouched frames will be being bulked and passed to the upper stack >>>>>>>> by 8, as it was with netif_receive_skb_list(). >>>>>>>> >>>>>>>> Signed-off-by: Alexander Lobakin <alexandr.lobakin@xxxxxxxxx> >>>>>>>> --- >>>>>>>> kernel/bpf/cpumap.c | 43 ++++++++++++++++++++++++++++++++++++++----- >>>>>>>> 1 file changed, 38 insertions(+), 5 deletions(-) >>>>>>>> >>>>>>> >>>>>>> AFAICT the cpumap + GRO is a good standalone improvement. I think >>>>>>> cpumap is still missing this. >>>>> >>>>> The only concern for having GRO in cpumap without metadata from the NIC >>>>> descriptor was that when the checksum status is missing, GRO calculates >>>>> the checksum on CPU, which is not really fast. >>>>> But I remember sometimes GRO was faster despite that. >>>> >>>> Good to know, thanks. IIUC some kind of XDP hint support landed already? >>>> >>>> My use case could also use HW RSS hash to avoid a rehash in XDP prog. >>> >>> Unfortunately, for now it's impossible to get HW metadata such as RSS >>> hash and checksum status in cpumap. They're implemented via kfuncs >>> specific to a particular netdevice and this info is available only when >>> running XDP prog. >>> >>> But I think one solution could be: >>> >>> 1. We create some generic structure for cpumap, like >>> >>> struct cpumap_meta { >>> u32 magic; >>> u32 hash; >>> } >>> >>> 2. We add such check in the cpumap code >>> >>> if (xdpf->metalen == sizeof(struct cpumap_meta) && >>> <here we check magic>) >>> skb->hash = meta->hash; >>> >>> 3. In XDP prog, you call Rx hints kfuncs when they're available, obtain >>> RSS hash and then put it in the struct cpumap_meta as XDP frame metadata. >> >> Yes, except don't make this cpumap-specific, make it generic for kernel >> consumption of the metadata. That way it doesn't even have to be stored >> in the xdp metadata area, it can be anywhere we want (and hence not >> subject to ABI issues), and we can use it for skb creation after >> redirect in other places than cpumap as well (say, on veth devices). >> >> So it'll be: >> >> struct kernel_meta { >> u32 hash; >> u32 timestamp; >> ...etc >> } >> >> and a kfunc: >> >> void store_xdp_kernel_meta(struct kernel meta *meta); >> >> which the XDP program can call to populate the metadata area. > > Hmm, nice! > > But where to store this info in case of cpumap if not in xdp->data_meta? > When you convert XDP frames to skbs in the cpumap code, you only have > &xdp_frame and that's it. XDP prog was already run earlier from the > driver code at that point. Well, we could put it in skb_shared_info? IIRC, some of the metadata (timestamps?) end up there when building an skb anyway, so we won't even have to copy it around... -Toke