On Aug 09, Toke wrote: > Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> writes: > > > From: Toke Høiland-Jørgensen <toke@xxxxxxxxxx> > > Date: Fri, 09 Aug 2024 14:45:33 +0200 > > > >> Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> writes: > >> > >>> From: Daniel Xu <dxu@xxxxxxxxx> > >>> Date: Thu, 08 Aug 2024 16:52:51 -0400 > >>> > >>>> Hi, > >>>> > >>>> On Thu, Aug 8, 2024, at 7:57 AM, Alexander Lobakin wrote: > >>>>> From: Lorenzo Bianconi <lorenzo.bianconi@xxxxxxxxxx> > >>>>> Date: Thu, 8 Aug 2024 06:54:06 +0200 > >>>>> > >>>>>>> Hi Alexander, > >>>>>>> > >>>>>>> On Tue, Jun 28, 2022, at 12:47 PM, Alexander Lobakin wrote: > >>>>>>>> cpumap has its own BH context based on kthread. It has a sane batch > >>>>>>>> size of 8 frames per one cycle. > >>>>>>>> GRO can be used on its own, adjust cpumap calls to the > >>>>>>>> upper stack to use GRO API instead of netif_receive_skb_list() which > >>>>>>>> processes skbs by batches, but doesn't involve GRO layer at all. > >>>>>>>> It is most beneficial when a NIC which frame come from is XDP > >>>>>>>> generic metadata-enabled, but in plenty of tests GRO performs better > >>>>>>>> than listed receiving even given that it has to calculate full frame > >>>>>>>> checksums on CPU. > >>>>>>>> As GRO passes the skbs to the upper stack in the batches of > >>>>>>>> @gro_normal_batch, i.e. 8 by default, and @skb->dev point to the > >>>>>>>> device where the frame comes from, it is enough to disable GRO > >>>>>>>> netdev feature on it to completely restore the original behaviour: > >>>>>>>> untouched frames will be being bulked and passed to the upper stack > >>>>>>>> by 8, as it was with netif_receive_skb_list(). > >>>>>>>> > >>>>>>>> Signed-off-by: Alexander Lobakin <alexandr.lobakin@xxxxxxxxx> > >>>>>>>> --- > >>>>>>>> kernel/bpf/cpumap.c | 43 ++++++++++++++++++++++++++++++++++++++----- > >>>>>>>> 1 file changed, 38 insertions(+), 5 deletions(-) > >>>>>>>> > >>>>>>> > >>>>>>> AFAICT the cpumap + GRO is a good standalone improvement. I think > >>>>>>> cpumap is still missing this. > >>>>> > >>>>> The only concern for having GRO in cpumap without metadata from the NIC > >>>>> descriptor was that when the checksum status is missing, GRO calculates > >>>>> the checksum on CPU, which is not really fast. > >>>>> But I remember sometimes GRO was faster despite that. > >>>> > >>>> Good to know, thanks. IIUC some kind of XDP hint support landed already? > >>>> > >>>> My use case could also use HW RSS hash to avoid a rehash in XDP prog. > >>> > >>> Unfortunately, for now it's impossible to get HW metadata such as RSS > >>> hash and checksum status in cpumap. They're implemented via kfuncs > >>> specific to a particular netdevice and this info is available only when > >>> running XDP prog. > >>> > >>> But I think one solution could be: > >>> > >>> 1. We create some generic structure for cpumap, like > >>> > >>> struct cpumap_meta { > >>> u32 magic; > >>> u32 hash; > >>> } > >>> > >>> 2. We add such check in the cpumap code > >>> > >>> if (xdpf->metalen == sizeof(struct cpumap_meta) && > >>> <here we check magic>) > >>> skb->hash = meta->hash; > >>> > >>> 3. In XDP prog, you call Rx hints kfuncs when they're available, obtain > >>> RSS hash and then put it in the struct cpumap_meta as XDP frame metadata. > >> > >> Yes, except don't make this cpumap-specific, make it generic for kernel > >> consumption of the metadata. That way it doesn't even have to be stored > >> in the xdp metadata area, it can be anywhere we want (and hence not > >> subject to ABI issues), and we can use it for skb creation after > >> redirect in other places than cpumap as well (say, on veth devices). > >> > >> So it'll be: > >> > >> struct kernel_meta { > >> u32 hash; > >> u32 timestamp; > >> ...etc > >> } > >> > >> and a kfunc: > >> > >> void store_xdp_kernel_meta(struct kernel meta *meta); > >> > >> which the XDP program can call to populate the metadata area. > > > > Hmm, nice! > > > > But where to store this info in case of cpumap if not in xdp->data_meta? > > When you convert XDP frames to skbs in the cpumap code, you only have > > &xdp_frame and that's it. XDP prog was already run earlier from the > > driver code at that point. > > Well, we could put it in skb_shared_info? IIRC, some of the metadata > (timestamps?) end up there when building an skb anyway, so we won't even > have to copy it around... Before vacation I started looking into it a bit, I will resume this work in one week or so. Regards, Lorenzo > > -Toke >
Attachment:
signature.asc
Description: PGP signature