On 10/10/24 8:21 AM, Jens Axboe wrote: >> which adds zc send. I ran a quick test, and it does reduce cpu >> utilization on the sender from 100% to 95%. I'll keep poking... > > Update on this - did more testing and the 100 -> 95 was a bit of a > fluke, it's still maxed. So I added io_uring send and sendzc support to > kperf, and I still saw the sendzc being maxed out sending at 100G rates > with 100% cpu usage. > > Poked a bit, and the reason is that it's all memcpy() off > skb_orphan_frags_rx() -> skb_copy_ubufs(). At this point I asked Pavel > as that made no sense to me, and turns out the kernel thinks there's a > tap on the device. Maybe there is, haven't looked at that yet, but I > just killed the orphaning and tested again. > > This looks better, now I can get 100G line rate from a single thread > using io_uring sendzc using only 30% of the single cpu/thread (including > irq time). That is good news, as it unlocks being able to test > 100G as > the sender is no longer the bottleneck. > > Tap side still a mystery, but it unblocked testing. I'll figure that > part out separately. > Thanks for the update. 30% cpu is more inline with my testing. For the "tap" you need to make sure no packet socket applications are running -- e.g., lldpd is a typical open I have a seen in tests. Check /proc/net/packet