On 7/7/22 5:49 AM, Pavel Begunkov wrote: > NOTE: Not be picked directly. After getting necessary acks, I'll be working > out merging with Jakub and Jens. > > The patchset implements io_uring zerocopy send. It works with both registered > and normal buffers, mixing is allowed but not recommended. Apart from usual > request completions, just as with MSG_ZEROCOPY, io_uring separately notifies > the userspace when buffers are freed and can be reused (see API design below), > which is delivered into io_uring's Completion Queue. Those "buffer-free" > notifications are not necessarily per request, but the userspace has control > over it and should explicitly attaching a number of requests to a single > notification. The series also adds some internal optimisations when used with > registered buffers like removing page referencing. > > From the kernel networking perspective there are two main changes. The first > one is passing ubuf_info into the network layer from io_uring (inside of an > in kernel struct msghdr). This allows extra optimisations, e.g. ubuf_info > caching on the io_uring side, but also helps to avoid cross-referencing > and synchronisation problems. The second part is an optional optimisation > removing page referencing for requests with registered buffers. > > Benchmarking with an optimised version of the selftest (see [1]), which sends > a bunch of requests, waits for completions and repeats. "+ flush" column posts > one additional "buffer-free" notification per request, and just "zc" doesn't > post buffer notifications at all. > > NIC (requests / second): > IO size | non-zc | zc | zc + flush > 4000 | 495134 | 606420 (+22%) | 558971 (+12%) > 1500 | 551808 | 577116 (+4.5%) | 565803 (+2.5%) > 1000 | 584677 | 592088 (+1.2%) | 560885 (-4%) > 600 | 596292 | 598550 (+0.4%) | 555366 (-6.7%) > > dummy (requests / second): > IO size | non-zc | zc | zc + flush > 8000 | 1299916 | 2396600 (+84%) | 2224219 (+71%) > 4000 | 1869230 | 2344146 (+25%) | 2170069 (+16%) > 1200 | 2071617 | 2361960 (+14%) | 2203052 (+6%) > 600 | 2106794 | 2381527 (+13%) | 2195295 (+4%) > > Previously it also brought a massive performance speedup compared to the > msg_zerocopy tool (see [3]), which is probably not super interesting. > can you add a comment that the above results are for UDP. You dropped comments about TCP testing; any progress there? If not, can you relay any issues you are hitting?