> > > > Slight aside: we know that MSG_ZEROCOPY is quite inefficient for > > small sends. Very rough rule of thumb is you need around 16KB or > > larger sends for it to outperform regular copy. Part of that is the > > memory pinning. The other part is the notification handling. > > MSG_ERRQUEUE is expensive. I hope that io_uring cannot just match, but > > improve on MSG_ZEROCOPY, especially for smaller packets. > > I has some numbers left from this patchset benchmarking. Not too > well suited to answer your question, but still gives an idea. > Just a benchmark, single buffer, 100g broadcom NIC IIRC. All is > io_uring based, -z<bool> switches copy vs zerocopy. Zero copy > uses registered buffers, so no page pinning and page table > traversal at runtime. 10s per run is not ideal, but was matching > longer runs. > > # 1200 bytes > ./send-zerocopy -4 tcp -D <ip> -t 10 -n 1 -l0 -b1 -d -s1200 -z0 > packets=15004160 (MB=17170), rps=1470996 (MB/s=1683) > ./send-zerocopy -4 tcp -D <ip> -t 10 -n 1 -l0 -b1 -d -s1200 -z1 > packets=10440224 (MB=11947), rps=1023551 (MB/s=1171) > > # 4000 bytes > ./send-zerocopy -4 tcp -D <ip> -t 10 -n 1 -l0 -b1 -d -s4000 -z0 > packets=11742688 (MB=44794), rps=1151243 (MB/s=4391) > ./send-zerocopy -4 tcp -D <ip> -t 10 -n 1 -l0 -b1 -d -s4000 -z1 > packets=14144048 (MB=53955), rps=1386671 (MB/s=5289) > > # 8000 bytes > ./send-zerocopy -4 tcp -D <ip> -t 10 -n 1 -l0 -b1 -d -s8000 -z0 > packets=6868976 (MB=52406), rps=673429 (MB/s=5137) > ./send-zerocopy -4 tcp -D <ip> -t 10 -n 1 -l0 -b1 -d -s8000 -z1 > packets=10800784 (MB=82403), rps=1058900 (MB/s=8078) Parity around 4K. That is very encouraging :)