From: Jakub Kicinski <kuba@xxxxxxxxxx> Date: Tue, 3 Dec 2024 16:51:57 -0800 > On Tue, 3 Dec 2024 12:01:16 +0100 Alexander Lobakin wrote: >>>> @ Jakub, >>> >>> Context? What doesn't work and why? >> >> My tests show the same perf as on Lorenzo's series, but I test with UDP >> trafficgen. Daniel tests TCP and the results are much worse than with >> Lorenzo's implementation. >> I suspect this is related to that how NAPI performs flushes / decides >> whether to repoll again or exit vs how kthread does that (even though I >> also try to flush only every 64 frames or when the ring is empty). Or >> maybe to that part of the kthread happens in process context outside any >> softirq, while when using NAPI, the whole loop is inside RX softirq. >> >> Jesper said that he'd like to see cpumap still using own kthread, so >> that its priority can be boosted separately from the backlog. That's why >> we asked you whether it would be fine to have cpumap as threaded NAPI in >> regards to all this :D > > Certainly not without a clear understanding what the problem with > a kthread is. Yes, sure thing. Bad thing's that I can't reproduce Daniel's problem >_< Previously, I was testing with the UDP trafficgen and got up to 80% improvement over the baseline. Now I tested TCP and got up to 70% improvement, no regressions whatsoever =\ I don't know where this regression on Daniel's setup comes from. Is it multi-thread or single-thread test? What app do you use: iperf, netperf, neper, Microsoft's app (forgot the name)? Do you have multiple NUMA nodes on your system, are you sure you didn't cross the node when redirecting with the GRO patches / no other NUMA mismatches happened? Some other random stuff like RSS hash key, which affects flow steering? Thanks, Olek