From: Daniel Xu <dxu@xxxxxxxxx> Date: Thu, 08 Aug 2024 16:52:51 -0400 > Hi, > > On Thu, Aug 8, 2024, at 7:57 AM, Alexander Lobakin wrote: >> From: Lorenzo Bianconi <lorenzo.bianconi@xxxxxxxxxx> >> Date: Thu, 8 Aug 2024 06:54:06 +0200 >> >>>> Hi Alexander, >>>> >>>> On Tue, Jun 28, 2022, at 12:47 PM, Alexander Lobakin wrote: >>>>> cpumap has its own BH context based on kthread. It has a sane batch >>>>> size of 8 frames per one cycle. >>>>> GRO can be used on its own, adjust cpumap calls to the >>>>> upper stack to use GRO API instead of netif_receive_skb_list() which >>>>> processes skbs by batches, but doesn't involve GRO layer at all. >>>>> It is most beneficial when a NIC which frame come from is XDP >>>>> generic metadata-enabled, but in plenty of tests GRO performs better >>>>> than listed receiving even given that it has to calculate full frame >>>>> checksums on CPU. >>>>> As GRO passes the skbs to the upper stack in the batches of >>>>> @gro_normal_batch, i.e. 8 by default, and @skb->dev point to the >>>>> device where the frame comes from, it is enough to disable GRO >>>>> netdev feature on it to completely restore the original behaviour: >>>>> untouched frames will be being bulked and passed to the upper stack >>>>> by 8, as it was with netif_receive_skb_list(). >>>>> >>>>> Signed-off-by: Alexander Lobakin <alexandr.lobakin@xxxxxxxxx> >>>>> --- >>>>> kernel/bpf/cpumap.c | 43 ++++++++++++++++++++++++++++++++++++++----- >>>>> 1 file changed, 38 insertions(+), 5 deletions(-) >>>>> >>>> >>>> AFAICT the cpumap + GRO is a good standalone improvement. I think >>>> cpumap is still missing this. >> >> The only concern for having GRO in cpumap without metadata from the NIC >> descriptor was that when the checksum status is missing, GRO calculates >> the checksum on CPU, which is not really fast. >> But I remember sometimes GRO was faster despite that. > > Good to know, thanks. IIUC some kind of XDP hint support landed already? > > My use case could also use HW RSS hash to avoid a rehash in XDP prog. Unfortunately, for now it's impossible to get HW metadata such as RSS hash and checksum status in cpumap. They're implemented via kfuncs specific to a particular netdevice and this info is available only when running XDP prog. But I think one solution could be: 1. We create some generic structure for cpumap, like struct cpumap_meta { u32 magic; u32 hash; } 2. We add such check in the cpumap code if (xdpf->metalen == sizeof(struct cpumap_meta) && <here we check magic>) skb->hash = meta->hash; 3. In XDP prog, you call Rx hints kfuncs when they're available, obtain RSS hash and then put it in the struct cpumap_meta as XDP frame metadata. > And HW RX timestamp to not break SO_TIMESTAMPING. These two > are on one of my TODO lists. But I can’t get to them for at least > a few weeks. So free to take it if you’d like. > >> >>>> >>>> I have a production use case for this now. We want to do some intelligent >>>> RX steering and I think GRO would help over list-ified receive in some cases. >>>> We would prefer steer in HW (and thus get existing GRO support) but not all >>>> our NICs support it. So we need a software fallback. >>>> >>>> Are you still interested in merging the cpumap + GRO patches? >> >> For sure I can revive this part. I was planning to get back to this >> branch and pick patches which were not related to XDP hints and send >> them separately. >> >>> >>> Hi Daniel and Alex, >>> >>> Recently I worked on a PoC to add GRO support to cpumap codebase: >>> - https://github.com/LorenzoBianconi/bpf-next/commit/a4b8264d5000ecf016da5a2dd9ac302deaf38b3e >>> Here I added GRO support to cpumap through gro-cells. >>> - https://github.com/LorenzoBianconi/bpf-next/commit/da6cb32a4674aa72401c7414c9a8a0775ef41a55 >>> Here I added GRO support to cpumap trough napi-threaded APIs (with a some >>> changes to them). >> >> Hmm, when I was testing it, adding a whole NAPI to cpumap was sorta >> overkill, that's why I separated GRO structure from &napi_struct. >> >> Let me maybe find some free time, I would then test all 3 solutions >> (mine, gro_cells, threaded NAPI) and pick/send the best? > > Sounds good. Would be good to compare results. > > […] > > Thanks, > Daniel Thanks, Olek