io_uring allocates a ubuf_info per zerocopy send request, it's convenient for the userspace but with how things are it means that every time the TCP stack has to allocate a new skb instead of amending into a previous one. Unless sends are big enough, there will be lots of small skbs straining the stack and dipping performance. The patchset implements notification, i.e. an io_uring's ubuf_info extension, stacking. It tries to link ubuf_info's into a list, and the entire link will be put down together once all references are gone. Testing with liburing/examples/send-zerocopy and another custom made tool, with 4K bytes per send it improves performance ~6 times and levels it with MSG_ZEROCOPY. Without the patchset it requires much larger sends to utilise all potential. bytes | before | after (Kqps) 100 | 283 | 936 1200 | 195 | 1023 4000 | 193 | 1386 8000 | 154 | 1058 Pavel Begunkov (6): net: extend ubuf_info callback to ops structure net: add callback for setting a ubuf_info to skb io_uring/notif: refactor io_tx_ubuf_complete() io_uring/notif: remove ctx var from io_notif_tw_complete io_uring/notif: simplify io_notif_flush() io_uring/notif: implement notification stacking drivers/net/tap.c | 2 +- drivers/net/tun.c | 2 +- drivers/vhost/net.c | 8 +++- include/linux/skbuff.h | 21 ++++++---- io_uring/notif.c | 91 +++++++++++++++++++++++++++++++++++------- io_uring/notif.h | 13 +++--- net/core/skbuff.c | 37 +++++++++++------ 7 files changed, 129 insertions(+), 45 deletions(-) -- 2.44.0