Re: [PATCH for-next v3 0/4] fixed-buffer for uring-cmd/passthrough

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/2/22 1:32 PM, Jens Axboe wrote:
> On 9/2/22 12:46 PM, Kanchan Joshi wrote:
>> On Fri, Sep 02, 2022 at 10:32:16AM -0600, Jens Axboe wrote:
>>> On 9/2/22 10:06 AM, Jens Axboe wrote:
>>>> On 9/2/22 9:16 AM, Kanchan Joshi wrote:
>>>>> Hi,
>>>>>
>>>>> Currently uring-cmd lacks the ability to leverage the pre-registered
>>>>> buffers. This series adds the support in uring-cmd, and plumbs
>>>>> nvme passthrough to work with it.
>>>>>
>>>>> Using registered-buffers showed peak-perf hike from 1.85M to 2.17M IOPS
>>>>> in my setup.
>>>>>
>>>>> Without fixedbufs
>>>>> *****************
>>>>> # taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p0 -F1 -B0 -O0 -n1 -u1 /dev/ng0n1
>>>>> submitter=0, tid=5256, file=/dev/ng0n1, node=-1
>>>>> polled=0, fixedbufs=0/0, register_files=1, buffered=1, QD=128
>>>>> Engine=io_uring, sq_ring=128, cq_ring=128
>>>>> IOPS=1.85M, BW=904MiB/s, IOS/call=32/31
>>>>> IOPS=1.85M, BW=903MiB/s, IOS/call=32/32
>>>>> IOPS=1.85M, BW=902MiB/s, IOS/call=32/32
>>>>> ^CExiting on signal
>>>>> Maximum IOPS=1.85M
>>>>
>>>> With the poll support queued up, I ran this one as well. tldr is:
>>>>
>>>> bdev (non pt)??? 122M IOPS
>>>> irq driven??? 51-52M IOPS
>>>> polled??????? 71M IOPS
>>>> polled+fixed??? 78M IOPS
>>
>> except first one, rest three entries are for passthru? somehow I didn't
>> see that big of a gap. I will try to align my setup in coming days.
> 
> Right, sorry it was badly labeled. First one is bdev with polling,
> registered buffers, etc. The others are all the passthrough mode. polled
> goes to 74M with the caching fix, so it's about a 74M -> 82M bump using
> registered buffers with passthrough and polling.
> 
>>> polled+fixed??? 82M
>>>
>>> I suspect the remainder is due to the lack of batching on the request
>>> freeing side, at least some of it. Haven't really looked deeper yet.
>>>
>>> One issue I saw - try and use passthrough polling without having any
>>> poll queues defined and it'll stall just spinning on completions. You
>>> need to ensure that these are processed as well - look at how the
>>> non-passthrough io_uring poll path handles it.
>>
>> Had tested this earlier, and it used to run fine. And it does not now.
>> I see that io are getting completed, irq-completion is arriving in nvme
>> and it is triggering task-work based completion (by calling
>> io_uring_cmd_complete_in_task). But task-work never got called and
>> therefore no completion happened.
>>
>> io_uring_cmd_complete_in_task -> io_req_task_work_add -> __io_req_task_work_add
>>
>> Seems task work did not get added. Something about newly added
>> IORING_SETUP_DEFER_TASKRUN changes the scenario.
>>
>> static inline void __io_req_task_work_add(struct io_kiocb *req, bool allow_local)
>> {
>> ?????? struct io_uring_task *tctx = req->task->io_uring;
>> ?????? struct io_ring_ctx *ctx = req->ctx;
>> ?????? struct llist_node *node;
>>
>> ?????? if (allow_local && ctx->flags & IORING_SETUP_DEFER_TASKRUN) {
>> ?????????????? io_req_local_work_add(req);
>> ?????????????? return;
>> ?????? }
>> ????....
>>
>> To confirm, I commented that in t/io_uring and it runs fine.
>> Please see if that changes anything for you? I will try to find the
>> actual fix tomorow.
> 
> Ah gotcha, yes that actually makes a lot of sense. I wonder if regular
> polling is then also broken without poll queues if
> IORING_SETUP_DEFER_TASKRUN is set. It should be, I'll check into
> io_iopoll_check().

A mix of fixes and just cleanups, here's what I got.

-- 
Jens Axboe

From 50155186644a352b290b72c61e738f62640d566a Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@xxxxxxxxx>
Date: Fri, 2 Sep 2022 15:16:29 -0600
Subject: [PATCH 1/3] io_uring: cleanly separate request types for iopoll

After the addition of iopoll support for passthrough, there's a bit of
a mixup here. Clean it up and get rid of the casting for the passthrough
command type.

Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
---
 io_uring/rw.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/io_uring/rw.c b/io_uring/rw.c
index 9698a789b3d5..3f03b6d2a5a3 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -994,7 +994,7 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
 
 	wq_list_for_each(pos, start, &ctx->iopoll_list) {
 		struct io_kiocb *req = container_of(pos, struct io_kiocb, comp_list);
-		struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
+		struct file *file = req->file;
 		int ret;
 
 		/*
@@ -1006,12 +1006,15 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
 			break;
 
 		if (req->opcode == IORING_OP_URING_CMD) {
-			struct io_uring_cmd *ioucmd = (struct io_uring_cmd *)rw;
+			struct io_uring_cmd *ioucmd;
 
-			ret = req->file->f_op->uring_cmd_iopoll(ioucmd);
-		} else
-			ret = rw->kiocb.ki_filp->f_op->iopoll(&rw->kiocb, &iob,
-							poll_flags);
+			ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
+			ret = file->f_op->uring_cmd_iopoll(ioucmd, poll_flags);
+		} else {
+			struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
+
+			ret = file->f_op->iopoll(&rw->kiocb, &iob, poll_flags);
+		}
 		if (unlikely(ret < 0))
 			return ret;
 		else if (ret)
-- 
2.35.1

From 0bc78c843b8636dcdfe45dd07328ca826fa67f9b Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@xxxxxxxxx>
Date: Fri, 2 Sep 2022 15:17:30 -0600
Subject: [PATCH 2/3] nvme: use separate end IO handler for IOPOLL

Don't need to rely on the cookie or request type, set the right handler
based on how we're handling the IO.

Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
---
 drivers/nvme/host/ioctl.c | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 7756b439a688..f34abe95821e 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -385,25 +385,36 @@ static void nvme_uring_task_cb(struct io_uring_cmd *ioucmd)
 	io_uring_cmd_done(ioucmd, status, result);
 }
 
-static void nvme_uring_cmd_end_io(struct request *req, blk_status_t err)
+static void nvme_uring_iopoll_cmd_end_io(struct request *req, blk_status_t err)
 {
 	struct io_uring_cmd *ioucmd = req->end_io_data;
 	struct nvme_uring_cmd_pdu *pdu = nvme_uring_cmd_pdu(ioucmd);
 	/* extract bio before reusing the same field for request */
 	struct bio *bio = pdu->bio;
-	void *cookie = READ_ONCE(ioucmd->cookie);
 
 	pdu->req = req;
 	req->bio = bio;
 
 	/*
 	 * For iopoll, complete it directly.
-	 * Otherwise, move the completion to task work.
 	 */
-	if (cookie != NULL && blk_rq_is_poll(req))
-		nvme_uring_task_cb(ioucmd);
-	else
-		io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_cb);
+	nvme_uring_task_cb(ioucmd);
+}
+
+static void nvme_uring_cmd_end_io(struct request *req, blk_status_t err)
+{
+	struct io_uring_cmd *ioucmd = req->end_io_data;
+	struct nvme_uring_cmd_pdu *pdu = nvme_uring_cmd_pdu(ioucmd);
+	/* extract bio before reusing the same field for request */
+	struct bio *bio = pdu->bio;
+
+	pdu->req = req;
+	req->bio = bio;
+
+	/*
+	 * Move the completion to task work.
+	 */
+	io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_cb);
 }
 
 static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
@@ -464,7 +475,10 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 			blk_flags);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
-	req->end_io = nvme_uring_cmd_end_io;
+	if (issue_flags & IO_URING_F_IOPOLL)
+		req->end_io = nvme_uring_iopoll_cmd_end_io;
+	else
+		req->end_io = nvme_uring_cmd_end_io;
 	req->end_io_data = ioucmd;
 
 	if (issue_flags & IO_URING_F_IOPOLL && rq_flags & REQ_POLLED) {
-- 
2.35.1

From 9daa39b146f3a8f412196df5eb9f9686f308e5cc Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe@xxxxxxxxx>
Date: Fri, 2 Sep 2022 15:18:05 -0600
Subject: [PATCH 3/3] fs: add batch and poll flags to the uring_cmd_iopoll()
 handler

We need the poll_flags to know how to poll for the IO, and we should
have the batch structure in preparation for supporting batched
completions with iopoll.

Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
---
 drivers/nvme/host/ioctl.c | 12 ++++++++----
 drivers/nvme/host/nvme.h  |  6 ++++--
 include/linux/fs.h        |  3 ++-
 io_uring/rw.c             |  3 ++-
 4 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index f34abe95821e..7a0b12ef49ae 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -637,7 +637,9 @@ int nvme_ns_chr_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags)
 	return nvme_ns_uring_cmd(ns, ioucmd, issue_flags);
 }
 
-int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd)
+int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd,
+				 struct io_comp_batch *iob,
+				 unsigned int poll_flags)
 {
 	struct bio *bio;
 	int ret = 0;
@@ -650,7 +652,7 @@ int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd)
 			struct nvme_ns, cdev);
 	q = ns->queue;
 	if (test_bit(QUEUE_FLAG_POLL, &q->queue_flags) && bio && bio->bi_bdev)
-		ret = bio_poll(bio, NULL, 0);
+		ret = bio_poll(bio, iob, poll_flags);
 	rcu_read_unlock();
 	return ret;
 }
@@ -736,7 +738,9 @@ int nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd,
 	return ret;
 }
 
-int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd)
+int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd,
+				      struct io_comp_batch *iob,
+				      unsigned int poll_flags)
 {
 	struct cdev *cdev = file_inode(ioucmd->file)->i_cdev;
 	struct nvme_ns_head *head = container_of(cdev, struct nvme_ns_head, cdev);
@@ -752,7 +756,7 @@ int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd)
 		q = ns->queue;
 		if (test_bit(QUEUE_FLAG_POLL, &q->queue_flags) && bio
 				&& bio->bi_bdev)
-			ret = bio_poll(bio, NULL, 0);
+			ret = bio_poll(bio, iob, poll_flags);
 		rcu_read_unlock();
 	}
 	srcu_read_unlock(&head->srcu, srcu_idx);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index fdcbc93dea21..216acbe953b3 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -821,8 +821,10 @@ long nvme_ns_head_chr_ioctl(struct file *file, unsigned int cmd,
 		unsigned long arg);
 long nvme_dev_ioctl(struct file *file, unsigned int cmd,
 		unsigned long arg);
-int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd);
-int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd);
+int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd,
+		struct io_comp_batch *iob, unsigned int poll_flags);
+int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd,
+		struct io_comp_batch *iob, unsigned int poll_flags);
 int nvme_ns_chr_uring_cmd(struct io_uring_cmd *ioucmd,
 		unsigned int issue_flags);
 int nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d6badd19784f..01681d061a6a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2132,7 +2132,8 @@ struct file_operations {
 				   loff_t len, unsigned int remap_flags);
 	int (*fadvise)(struct file *, loff_t, loff_t, int);
 	int (*uring_cmd)(struct io_uring_cmd *ioucmd, unsigned int issue_flags);
-	int (*uring_cmd_iopoll)(struct io_uring_cmd *ioucmd);
+	int (*uring_cmd_iopoll)(struct io_uring_cmd *, struct io_comp_batch *,
+				unsigned int poll_flags);
 } __randomize_layout;
 
 struct inode_operations {
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 3f03b6d2a5a3..4a061326c664 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -1009,7 +1009,8 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
 			struct io_uring_cmd *ioucmd;
 
 			ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
-			ret = file->f_op->uring_cmd_iopoll(ioucmd, poll_flags);
+			ret = file->f_op->uring_cmd_iopoll(ioucmd, &iob,
+								poll_flags);
 		} else {
 			struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
 
-- 
2.35.1


[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux