On Thu, Mar 21, 2024 at 4:28 AM Jens Axboe <axboe@xxxxxxxxx> wrote: > > The previous commit turned on async data for uring_cmd, and did the > basic conversion of setting everything up on the prep side. However, for > a lot of use cases, we'll get -EIOCBQUEUED on issue, which means we do > not need a persistent big SQE copied. > > Unless we're going async immediately, defer copying the double SQE until > we know we have to. > > This greatly reduces the overhead of such commands, as evidenced by > a perf diff from before and after this change: > > 10.60% -8.58% [kernel.vmlinux] [k] io_uring_cmd_prep > > where the prep side drops from 10.60% to ~2%, which is more expected. > Performance also rises from ~113M IOPS to ~122M IOPS, bringing us back > to where it was before the async command prep. > > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> > > ~# Last command done (1 command done): > --- > io_uring/uring_cmd.c | 25 +++++++++++++++++++------ > 1 file changed, 19 insertions(+), 6 deletions(-) > > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c > index 9bd0ba87553f..92346b5d9f5b 100644 > --- a/io_uring/uring_cmd.c > +++ b/io_uring/uring_cmd.c > @@ -182,12 +182,18 @@ static int io_uring_cmd_prep_setup(struct io_kiocb *req, > struct uring_cache *cache; > > cache = io_uring_async_get(req); > - if (cache) { > - memcpy(cache->sqes, sqe, uring_sqe_size(req->ctx)); > - ioucmd->sqe = req->async_data; > + if (unlikely(!cache)) > + return -ENOMEM; > + > + if (!(req->flags & REQ_F_FORCE_ASYNC)) { > + /* defer memcpy until we need it */ > + ioucmd->sqe = sqe; > return 0; > } > - return -ENOMEM; > + > + memcpy(req->async_data, sqe, uring_sqe_size(req->ctx)); > + ioucmd->sqe = req->async_data; > + return 0; > } > > int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) > @@ -245,8 +251,15 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags) > } > > ret = file->f_op->uring_cmd(ioucmd, issue_flags); > - if (ret == -EAGAIN || ret == -EIOCBQUEUED) > - return ret; > + if (ret == -EAGAIN) { > + struct uring_cache *cache = req->async_data; > + > + if (ioucmd->sqe != (void *) cache) > + memcpy(cache, ioucmd->sqe, uring_sqe_size(req->ctx)); > + return -EAGAIN; > + } else if (ret == -EIOCBQUEUED) { > + return -EIOCBQUEUED; > + } > > if (ret < 0) > req_set_fail(req); > -- > 2.43.0 > > The io_uring_cmd plumbing part of this series looks good to me. I tested it with io_uring nvme-passthrough on my setup with two optanes and there is no drop in performance as well [1]. For this and the previous patch, Tested-by: Anuj Gupta <anuj20.g@xxxxxxxxxxx> Reviewed-by: Anuj Gupta <anuj20.g@xxxxxxxxxxx> [1] # taskset -c 2,5 t/io_uring -b512 -d128 -c32 -s32 -p1 -O0 -F1 -B1 -u1 -n2 -r4 /dev/ng0n1 /dev/ng2n1 submitter=1, tid=7166, file=/dev/ng2n1, nfiles=1, node=-1 submitter=0, tid=7165, file=/dev/ng0n1, nfiles=1, node=-1 polled=1, fixedbufs=1, register_files=1, buffered=1, QD=128 Engine=io_uring, sq_ring=128, cq_ring=128 IOPS=10.02M, BW=4.89GiB/s, IOS/call=31/31 IOPS=10.04M, BW=4.90GiB/s, IOS/call=31/31 IOPS=10.04M, BW=4.90GiB/s, IOS/call=31/31 Exiting on timeout Maximum IOPS=10.04M -- Anuj Gupta