From: Daniel Xu <dxu@xxxxxxxxx> Date: Wed, 04 Dec 2024 13:51:08 -0800 > > > On Wed, Dec 4, 2024, at 8:42 AM, Alexander Lobakin wrote: >> From: Jakub Kicinski <kuba@xxxxxxxxxx> >> Date: Tue, 3 Dec 2024 16:51:57 -0800 >> >>> On Tue, 3 Dec 2024 12:01:16 +0100 Alexander Lobakin wrote: >>>>>> @ Jakub, >>>>> >>>>> Context? What doesn't work and why? >>>> >>>> My tests show the same perf as on Lorenzo's series, but I test with UDP >>>> trafficgen. Daniel tests TCP and the results are much worse than with >>>> Lorenzo's implementation. >>>> I suspect this is related to that how NAPI performs flushes / decides >>>> whether to repoll again or exit vs how kthread does that (even though I >>>> also try to flush only every 64 frames or when the ring is empty). Or >>>> maybe to that part of the kthread happens in process context outside any >>>> softirq, while when using NAPI, the whole loop is inside RX softirq. >>>> >>>> Jesper said that he'd like to see cpumap still using own kthread, so >>>> that its priority can be boosted separately from the backlog. That's why >>>> we asked you whether it would be fine to have cpumap as threaded NAPI in >>>> regards to all this :D >>> >>> Certainly not without a clear understanding what the problem with >>> a kthread is. >> >> Yes, sure thing. >> >> Bad thing's that I can't reproduce Daniel's problem >_< Previously, I >> was testing with the UDP trafficgen and got up to 80% improvement over >> the baseline. Now I tested TCP and got up to 70% improvement, no >> regressions whatsoever =\ >> >> I don't know where this regression on Daniel's setup comes from. Is it >> multi-thread or single-thread test? > > 8 threads with 16 flows over them (-T8 -F16) > >> What app do you use: iperf, netperf, >> neper, Microsoft's app (forgot the name)? > > neper, tcp_stream. Let me recheck with neper -T8 -F16, I'll post my results soon. > >> Do you have multiple NUMA >> nodes on your system, are you sure you didn't cross the node when >> redirecting with the GRO patches / no other NUMA mismatches happened? > > Single node. Technically EPYC NPS=1. So there are some numa characteristics > but I think the interconnect is supposed to hide it fairly efficiently. > >> Some other random stuff like RSS hash key, which affects flow steering? > > Whatever the default is - I'd be willing to be Kuba set up the configuration > at one point or another so it's probably sane. And with 5 runs it seems > unlikely the hashing would get unlucky and cause an imbalance. > >> >> Thanks, >> Olek > > Since I've got the setup handy and am motivated to see this work land, > do you have any other pointers for things I should look for? I'll spend some > time looking at profiles to see if I can identify any hot spots compared to > softirq based GRO. > > Thanks, > Daniel Thanks for helping with this! Olek