On Fri, Mar 24, 2023 at 05:06:00PM -0600, Jens Axboe wrote: > On 3/24/23 4:41?PM, Ming Lei wrote: > > On Fri, Mar 24, 2023 at 08:35:38AM -0600, Jens Axboe wrote: > >> It's very common to have applications that use vectored reads or writes, > >> even if they only pass in a single segment. Obviously they should be > >> using read/write at that point, but... > > > > Yeah, it is like fixing application issue in kernel side, :-) > > It totally is, the same thing happens all of the time for readv as well. > No amount of informing or documenting will ever fix that... > > Also see: > > https://lore.kernel.org/linux-fsdevel/20230324204443.45950-1-axboe@xxxxxxxxx/ > > with which I think I'll change this one to just be: > > if (iter->iter_type == ITER_UBUF) { > rw->addr = iter->ubuf; > rw->len = iter->count; > /* readv -> read distance is the same as writev -> write */ > BUILD_BUG_ON((IORING_OP_READ - IORING_OP_READV) != > (IORING_OP_WRITE - IORING_OP_WRITEV)); > req->opcode += (IORING_OP_READ - IORING_OP_READV); > } > > instead. > > We could also just skip it completely and just have liburing do the > right thing if io_uring_prep_readv/writev is called with nr_segs == 1. > Just turn it into a READ/WRITE at that point. If we do that, and with > the above generic change, it's probably Good Enough. If you use > READV/WRITEV and you're using the raw interface, then you're on your > own. > > >> + rw->addr = (unsigned long) iter->iov[0].iov_base; > >> + rw->len = iter->iov[0].iov_len; > >> + iov_iter_ubuf(iter, ddir, iter->iov[0].iov_base, rw->len); > >> + /* readv -> read distance is the same as writev -> write */ > >> + BUILD_BUG_ON((IORING_OP_READ - IORING_OP_READV) != > >> + (IORING_OP_WRITE - IORING_OP_WRITEV)); > >> + req->opcode += (IORING_OP_READ - IORING_OP_READV); > > > > It is a bit fragile to change ->opcode, which may need matched > > callbacks for the two OPs, also cause inconsistent opcode in traces. > > > > I am wondering why not play the magic in io_prep_rw() from beginning? > > It has to be done when importing the vec, we cannot really do it in > prep... Well we could, but that'd be adding a bunch more code and > duplicating part of the vec import. I meant something like the following(un-tested), which at least guarantees that op_code, rw->addr/len are finalized since ->prep(). diff --git a/io_uring/rw.c b/io_uring/rw.c index 0c292ef9a40f..4bf4c3effdac 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -120,6 +120,25 @@ int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe) return ret; } + if (req->opcode == IORING_OP_READV && req->opcode == IORING_OP_WRITEV && + rw->len == 1) { + struct iovec iov; + struct iovec *iovp; + + iovp = iovec_from_user(u64_to_user_ptr(rw->addr), 1, 1, &iov, + req->ctx->compat); + if (IS_ERR(iovp)) + return PTR_ERR(iovp); + + rw->addr = (unsigned long) iovp->iov_base; + rw->len = iovp->iov_len; + + /* readv -> read distance is the same as writev -> write */ + BUILD_BUG_ON((IORING_OP_READ - IORING_OP_READV) != + (IORING_OP_WRITE - IORING_OP_WRITEV)); + req->opcode += (IORING_OP_READ - IORING_OP_READV); + } + return 0; } Thanks, Ming