[PATCH] io_uring/rw: transform single vector readv/writev into ubuf

Jens Axboe <axboe@xxxxxxxxx> · Fri, 24 Mar 2023 08:35:38 -0600

It's very common to have applications that use vectored reads or writes,
even if they only pass in a single segment. Obviously they should be
using read/write at that point, but...

Vectored IO comes with the downside of needing to retain iovec state,
and hence they require and allocation and state copy if they end up
getting deferred. Additionally, they also require extra cleanup when
completed as the memory as the allocated state memory has to be freed.

Automatically transform single segment IORING_OP_{READV,WRITEV} into
IORING_OP_{READ,WRITE}, and hence into an ITER_UBUF. Outside of being
more efficient if needing deferral, ITER_UBUF is also more efficient
for normal processing compared to ITER_IOVEC, as they don't require
iteration. The latter is apparent when running peak testing, where
using IORING_OP_READV to randomly read 24 drives previously scored:

IOPS=72.54M, BW=35.42GiB/s, IOS/call=32/31
IOPS=73.35M, BW=35.81GiB/s, IOS/call=32/31
IOPS=72.71M, BW=35.50GiB/s, IOS/call=32/31
IOPS=73.29M, BW=35.78GiB/s, IOS/call=32/32
IOPS=73.45M, BW=35.86GiB/s, IOS/call=32/32
IOPS=73.19M, BW=35.74GiB/s, IOS/call=31/32
IOPS=72.89M, BW=35.59GiB/s, IOS/call=32/31
IOPS=73.07M, BW=35.68GiB/s, IOS/call=32/32

and after the change we get:

IOPS=77.31M, BW=37.75GiB/s, IOS/call=32/31
IOPS=77.32M, BW=37.75GiB/s, IOS/call=32/32
IOPS=77.45M, BW=37.81GiB/s, IOS/call=31/31
IOPS=77.47M, BW=37.83GiB/s, IOS/call=32/32
IOPS=77.14M, BW=37.67GiB/s, IOS/call=32/32
IOPS=77.14M, BW=37.66GiB/s, IOS/call=31/31
IOPS=77.37M, BW=37.78GiB/s, IOS/call=32/32
IOPS=77.25M, BW=37.72GiB/s, IOS/call=32/32

which is a nice win as well.

Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>

---

diff --git a/io_uring/rw.c b/io_uring/rw.c
index 4c233910e200..5c998754cb17 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -402,7 +402,22 @@ static struct iovec *__io_import_iovec(int ddir, struct io_kiocb *req,
 			      req->ctx->compat);
 	if (unlikely(ret < 0))
 		return ERR_PTR(ret);
-	return iovec;
+	if (iter->nr_segs != 1)
+		return iovec;
+	/*
+	 * Convert to non-vectored request if we have a single segment. If we
+	 * need to defer the request, then we no longer have to allocate and
+	 * maintain a struct io_async_rw. Additionally, we won't have cleanup
+	 * to do at completion time
+	 */
+	rw->addr = (unsigned long) iter->iov[0].iov_base;
+	rw->len = iter->iov[0].iov_len;
+	iov_iter_ubuf(iter, ddir, iter->iov[0].iov_base, rw->len);
+	/* readv -> read distance is the same as writev -> write */
+	BUILD_BUG_ON((IORING_OP_READ - IORING_OP_READV) !=
+			(IORING_OP_WRITE - IORING_OP_WRITEV));
+	req->opcode += (IORING_OP_READ - IORING_OP_READV);
+	return NULL;
 }
 
 static inline int io_import_iovec(int rw, struct io_kiocb *req,

-- 
Jens Axboe