It's very common to have applications that use vectored reads or writes, even if they only pass in a single segment. Obviously they should be using read/write at that point, but... Vectored IO comes with the downside of needing to retain iovec state, and hence they require and allocation and state copy if they end up getting deferred. Additionally, they also require extra cleanup when completed as the memory as the allocated state memory has to be freed. Automatically transform single segment IORING_OP_{READV,WRITEV} into IORING_OP_{READ,WRITE}, and hence into an ITER_UBUF. Outside of being more efficient if needing deferral, ITER_UBUF is also more efficient for normal processing compared to ITER_IOVEC, as they don't require iteration. The latter is apparent when running peak testing, where using IORING_OP_READV to randomly read 24 drives previously scored: IOPS=72.54M, BW=35.42GiB/s, IOS/call=32/31 IOPS=73.35M, BW=35.81GiB/s, IOS/call=32/31 IOPS=72.71M, BW=35.50GiB/s, IOS/call=32/31 IOPS=73.29M, BW=35.78GiB/s, IOS/call=32/32 IOPS=73.45M, BW=35.86GiB/s, IOS/call=32/32 IOPS=73.19M, BW=35.74GiB/s, IOS/call=31/32 IOPS=72.89M, BW=35.59GiB/s, IOS/call=32/31 IOPS=73.07M, BW=35.68GiB/s, IOS/call=32/32 and after the change we get: IOPS=77.31M, BW=37.75GiB/s, IOS/call=32/31 IOPS=77.32M, BW=37.75GiB/s, IOS/call=32/32 IOPS=77.45M, BW=37.81GiB/s, IOS/call=31/31 IOPS=77.47M, BW=37.83GiB/s, IOS/call=32/32 IOPS=77.14M, BW=37.67GiB/s, IOS/call=32/32 IOPS=77.14M, BW=37.66GiB/s, IOS/call=31/31 IOPS=77.37M, BW=37.78GiB/s, IOS/call=32/32 IOPS=77.25M, BW=37.72GiB/s, IOS/call=32/32 which is a nice win as well. Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> --- diff --git a/io_uring/rw.c b/io_uring/rw.c index 4c233910e200..5c998754cb17 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -402,7 +402,22 @@ static struct iovec *__io_import_iovec(int ddir, struct io_kiocb *req, req->ctx->compat); if (unlikely(ret < 0)) return ERR_PTR(ret); - return iovec; + if (iter->nr_segs != 1) + return iovec; + /* + * Convert to non-vectored request if we have a single segment. If we + * need to defer the request, then we no longer have to allocate and + * maintain a struct io_async_rw. Additionally, we won't have cleanup + * to do at completion time + */ + rw->addr = (unsigned long) iter->iov[0].iov_base; + rw->len = iter->iov[0].iov_len; + iov_iter_ubuf(iter, ddir, iter->iov[0].iov_base, rw->len); + /* readv -> read distance is the same as writev -> write */ + BUILD_BUG_ON((IORING_OP_READ - IORING_OP_READV) != + (IORING_OP_WRITE - IORING_OP_WRITEV)); + req->opcode += (IORING_OP_READ - IORING_OP_READV); + return NULL; } static inline int io_import_iovec(int rw, struct io_kiocb *req, -- Jens Axboe