Update on io_uring zerocopy tx, still RFC. For v1 and design notes see https://lore.kernel.org/io-uring/cover.1638282789.git.asml.silence@xxxxxxxxx/ Absolute numbers (against dummy) got higher since v1, + ~10-12% requests/s for the peak performance case. 5/19 brought a couple of percents, but most of it came with 8/19 and 9/19 (+8-11% in numbers, 5-7% in profiles). It will also be needed in the future for p2p. Any reason not to do alike for paged non-zc? Small (under 100-150B) packets? Most of checks are removed from non-zc paths. Implemented a bit trickier in __ip_append_data(), but considering already existing assumptions around "from" argument it should be fine. Benchmarks for dummy netdev, UDP/IPv4, payload size=4096: -n<N> is how many requests we submit per syscall. From io_uring perspective -n1 is wasteful and far from optimal, but included for comparison. -z0 disables zerocopy, just normal io_uring send requests -f makes to flush "buffer free" notifications for every request | K reqs/s | speedup msg_zerocopy (non-zc) | 1120 | 1.12 msg_zerocopy (zc) | 997 | 1 io_uring -n1 -z0 | 1469 | 1.47 io_uring -n8 -z0 | 1780 | 1.78 io_uring -n1 -f | 1688 | 1.69 io_uring -n1 | 1774 | 1.77 io_uring -n8 -f | 2075 | 2.08 io_uring -n8 | 2265 | 2.27 note: it might be not too interesting to compare zc vs non-zc, the performance relative difference can be shifted in favour of zerocopy by cutting constant per-request overhead, and there are easy ways of doing that, e.g. by compiling out unused features. Even more true for the table below as there was additional noise taking a good quarter of CPU cycles. Some data for UDP/IPv6 between a pair of NICs. 9/19 wasn't there at the time of testing. All tests are CPU bound and so as expected reqs/s for zerocopy doesn't vary much between different payload sizes. io_uring to msg_zerocopy ratio is not too representative for reasons similar to described above. payload | test | K reqs/s ___________________________________________ 8192 | io_uring -n8 (dummy) | 599 | io_uring -n1 -z0 | 264 | io_uring -n8 -z0 | 302 | msg_zerocopy | 248 | msg_zerocopy -z | 183 | io_uring -n1 -f | 306 | io_uring -n1 | 318 | io_uring -n8 -f | 373 | io_uring -n8 | 401 4096 | io_uring -n8 (dummy) | 601 | io_uring -n1 -z0 | 303 | io_uring -n8 -z0 | 366 | msg_zerocopy | 278 | msg_zerocopy -z | 187 | io_uring -n1 -f | 317 | io_uring -n1 | 325 | io_uring -n8 -f | 387 | io_uring -n8 | 405 1024 | io_uring -n8 (dummy) | 601 | io_uring -n1 -z0 | 329 | io_uring -n8 -z0 | 407 | msg_zerocopy | 301 | msg_zerocopy -z | 186 | io_uring -n1 -f | 317 | io_uring -n1 | 327 | io_uring -n8 -f | 390 | io_uring -n8 | 403 512 | io_uring -n8 (dummy) | 601 | io_uring -n1 -z0 | 340 | io_uring -n8 -z0 | 417 | msg_zerocopy | 310 | msg_zerocopy -z | 186 | io_uring -n1 -f | 317 | io_uring -n1 | 328 | io_uring -n8 -f | 392 | io_uring -n8 | 406 128 | io_uring -n8 (dummy) | 602 | io_uring -n1 -z0 | 341 | io_uring -n8 -z0 | 428 | msg_zerocopy | 317 | msg_zerocopy -z | 188 | io_uring -n1 -f | 318 | io_uring -n1 | 331 | io_uring -n8 -f | 391 | io_uring -n8 | 408 https://github.com/isilence/linux/tree/zc_v2 https://github.com/isilence/liburing/tree/zc_v2 The Benchmark is <liburing>/test/send-zc, send-zc [-f] [-n<N>] [-z0] -s<payload size> -D<dst ip> (-6|-4) [-t<sec>] udp As a server you can use msg_zerocopy from in kernel's selftests, or a copy of it at <liburing>/test/msg_zerocopy. No server is needed for dummy testing. dummy setup: sudo ip li add dummy0 type dummy && sudo ip li set dummy0 up mtu 65536 # make traffic for the specified IP to go through dummy0 sudo ip route add <ip_address> dev dummy0 v2: remove additional overhead for non-zc from skb_release_data() (Jonathan) avoid msg propagation, hide extra bits of non-zc overhead task_work based "buffer free" notifications improve io_uring's notification refcounting added 5/19, (no pfmemalloc tracking) added 8/19 and 9/19 preventing small copies with zc misc small changes Pavel Begunkov (19): skbuff: add SKBFL_DONT_ORPHAN flag skbuff: pass a struct ubuf_info in msghdr net: add zerocopy_sg_from_iter for bvec net: optimise page get/free for bvec zc net: don't track pfmemalloc for zc registered mem ipv4/udp: add support msgdr::msg_ubuf ipv6/udp: add support msgdr::msg_ubuf ipv4: avoid partial copy for zc ipv6: avoid partial copy for zc io_uring: add send notifiers registration io_uring: infrastructure for send zc notifications io_uring: wire send zc request type io_uring: add an option to flush zc notifications io_uring: opcode independent fixed buf import io_uring: sendzc with fixed buffers io_uring: cache struct ubuf_info io_uring: unclog ctx refs waiting with zc notifiers io_uring: task_work for notification delivery io_uring: optimise task referencing by notifiers fs/io_uring.c | 440 +++++++++++++++++++++++++++++++++- include/linux/skbuff.h | 46 ++-- include/linux/socket.h | 1 + include/uapi/linux/io_uring.h | 14 ++ net/compat.c | 1 + net/core/datagram.c | 58 +++++ net/core/skbuff.c | 16 +- net/ipv4/ip_output.c | 55 +++-- net/ipv6/ip6_output.c | 54 ++++- net/socket.c | 3 + 10 files changed, 633 insertions(+), 55 deletions(-) -- 2.34.1