Re: [PATCH 05/17] nvme: wire-up support for async-passthru on char-device.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 11, 2022 at 08:01:48AM +0100, Christoph Hellwig wrote:
On Tue, Mar 08, 2022 at 08:50:53PM +0530, Kanchan Joshi wrote:
+/*
+ * This overlays struct io_uring_cmd pdu.
+ * Expect build errors if this grows larger than that.
+ */
+struct nvme_uring_cmd_pdu {
+	u32 meta_len;
+	union {
+		struct bio *bio;
+		struct request *req;
+	};
+	void *meta; /* kernel-resident buffer */
+	void __user *meta_buffer;
+} __packed;

Why is this marked __packed?
Did not like doing it, but had to.
If not packed, this takes 32 bytes of space. While driver-pdu in struct
io_uring_cmd can take max 30 bytes. Packing nvme-pdu brought it down to
28 bytes, which fits and gives 2 bytes back.

For quick reference - struct io_uring_cmd {
       struct file *              file;                 /*     0     8 */
       void *                     cmd;                  /*     8     8 */
       union {
               void *             bio;                  /*    16     8 */
               void               (*driver_cb)(struct io_uring_cmd *); /*    16     8 */
       };                                               /*    16     8 */
       u32                        flags;                /*    24     4 */
       u32                        cmd_op;               /*    28     4 */
       u16                        cmd_len;              /*    32     2 */
       u16                        unused;               /*    34     2 */
       u8                         pdu[28];              /*    36    28 */

       /* size: 64, cachelines: 1, members: 8 */
};
io_uring_cmd struct goes into the first cacheline of io_kiocb.
Last field is pdu, taking 28 bytes. Will be 30 if I evaporate above
field.
nvme-pdu after packing:
struct nvme_uring_cmd_pdu {
       u32                        meta_len;             /*     0     4 */
       union {
               struct bio *       bio;                  /*     4     8 */
               struct request *   req;                  /*     4     8 */
       };                                               /*     4     8 */
       void *                     meta;                 /*    12     8 */
       void *                     meta_buffer;          /*    20     8 */

       /* size: 28, cachelines: 1, members: 4 */
       /* last cacheline: 28 bytes */
} __attribute__((__packed__));

In general I'd be much more happy if the meta elelements were a
io_uring-level feature handled outside the driver and typesafe in
struct io_uring_cmd, with just a normal private data pointer for the
actual user, which would remove all the crazy casting.

Not sure if I got your point.

+static struct nvme_uring_cmd_pdu *nvme_uring_cmd_pdu(struct io_uring_cmd *ioucmd)
+{
+       return (struct nvme_uring_cmd_pdu *)&ioucmd->pdu;
+}
+
+static void nvme_pt_task_cb(struct io_uring_cmd *ioucmd)
+{
+       struct nvme_uring_cmd_pdu *pdu = nvme_uring_cmd_pdu(ioucmd);

Do you mean crazy casting inside nvme_uring_cmd_pdu()?
Somehow this looks sane to me (perhaps because it used to be crazier
earlier).

And on moving meta elements outside the driver, my worry is that it
reduces scope of uring-cmd infra and makes it nvme passthru specific.
At this point uring-cmd is still generic async ioctl/fsctl facility
which may find other users (than nvme-passthru) down the line. Organization of fields within "struct io_uring_cmd" is around the rule
that a field is kept out (of 28 bytes pdu) only if is accessed by both
io_uring and driver.
+static void nvme_end_async_pt(struct request *req, blk_status_t err)
+{
+	struct io_uring_cmd *ioucmd = req->end_io_data;
+	struct nvme_uring_cmd_pdu *pdu = nvme_uring_cmd_pdu(ioucmd);
+	/* extract bio before reusing the same field for request */
+	struct bio *bio = pdu->bio;
+
+	pdu->req = req;
+	req->bio = bio;
+	/* this takes care of setting up task-work */
+	io_uring_cmd_complete_in_task(ioucmd, nvme_pt_task_cb);

This is a bit silly.  First we defer the actual request I/O completion
from the block layer to a different CPU or softirq and then we have
another callback here.  I think it would be much more useful if we
could find a way to enhance blk_mq_complete_request so that it could
directly complet in a given task.  That would also be really nice for
say normal synchronous direct I/O.

I see, so there is room for adding some efficiency.
Hope it will be ok if I carry this out as a separate effort.
Since this is about touching blk_mq_complete_request at its heart, and
improving sync-direct-io, this does not seem best-fit and slow this
series down.

FWIW, I ran the tests with counters inside blk_mq_complete_request_remote()

       if (blk_mq_complete_need_ipi(rq)) {
               blk_mq_complete_send_ipi(rq);
               return true;
       }

       if (rq->q->nr_hw_queues == 1) {
               blk_mq_raise_softirq(rq);
               return true;
       }
Deferring by ipi or softirq never occured. Neither for block nor for
char. Softirq is obvious since I was not running against scsi (or nvme with
single queue). I could not spot whether this is really a overhead, at
least for nvme.


+	if (ioucmd) { /* async dispatch */
+		if (cmd->common.opcode == nvme_cmd_write ||
+				cmd->common.opcode == nvme_cmd_read) {

No we can't just check for specific commands in the passthrough handler.

Right. This is for inline-cmd approach. Last two patches of the series undo this (for indirect-cmd).
I will do something about it.

+			nvme_setup_uring_cmd_data(req, ioucmd, meta, meta_buffer,
+					meta_len);
+			blk_execute_rq_nowait(req, 0, nvme_end_async_pt);
+			return 0;
+		} else {
+			/* support only read and write for now. */
+			ret = -EINVAL;
+			goto out_meta;
+		}

Pleae always handle error in the first branch and don't bother with an
else after a goto or return.

Yes, that'll be better.
+static int nvme_ns_async_ioctl(struct nvme_ns *ns, struct io_uring_cmd *ioucmd)
+{
+	int ret;
+
+	BUILD_BUG_ON(sizeof(struct nvme_uring_cmd_pdu) > sizeof(ioucmd->pdu));
+
+	switch (ioucmd->cmd_op) {
+	case NVME_IOCTL_IO64_CMD:
+		ret = nvme_user_cmd64(ns->ctrl, ns, NULL, ioucmd);
+		break;
+	default:
+		ret = -ENOTTY;
+	}
+
+	if (ret >= 0)
+		ret = -EIOCBQUEUED;

That's a weird way to handle the returns.  Just return -EIOCBQUEUED
directly from the handler (which as said before should be split from
the ioctl handler anyway).

Indeed. That will make it cleaner.





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux