On 3/3/25 21:03, Andres Freund wrote:
Hi,
On 2025-03-03 15:50:55 +0000, Pavel Begunkov wrote:
Add registered buffer support for vectored io_uring operations. That
allows to pass an iovec, all entries of which must belong to and
point into the same registered buffer specified by sqe->buf_index.
This is very much appreciated!'
Glad to hear. I do remember you mentioning the contention issue
in the list. A bunch of other people who were interested as well.
The series covers zerocopy sendmsg and reads / writes. Reads and
writes are implemented as new opcodes, while zerocopy sendmsg
reuses IORING_RECVSEND_FIXED_BUF for the api.
Results are aligned to what one would expect from registered buffers:
t/io_uring + nullblk, single segment 16K:
34 -> 46 GiB/s
FWIW, I'd expect bigger wins with real IO when using 1GB huge pages. I
I didn't even benchmark it meaningfully as we should be able to
extrapolate results from registered buffer test, but I agree, such
contention might make it even more desirable.
encountered when there were a lot of reads from a large nvme raid into a small
set of shared huge pages (database buffer pool), by many proceses
concurrently. The constant pinning/unpinning of the relevant folio caused a
lot of contention.
Unfortunately switching to registered buffers would, until now, have required
using non-vectored IO, which causes significant performance regressions in
other cases...
--
Pavel Begunkov