Re: [PATCH 17/17] nvme: enable non-inline passthru commands

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 24, 2022 at 07:32:18AM +0100, Christoph Hellwig wrote:
On Tue, Mar 22, 2022 at 10:40:27PM +0530, Kanchan Joshi wrote:
On Fri, Mar 11, 2022 at 11:57 AM Christoph Hellwig <hch@xxxxxx> wrote:
> > And that's because this ioctl requires additional "__u64 result;" to
> > be updated within "struct nvme_passthru_cmd64".
> > To update that during completion, we need, at the least, the result
> > field to be a pointer "__u64 result_ptr" inside the struct
> > nvme_passthru_cmd64.
> > Do you see that is possible without adding a new passthru ioctl in nvme?
>
> We don't need a new passthrough ioctl in nvme.
Right. Maybe it is easier for applications if they get to use the same
ioctl opcode/structure that they know well already.

I disagree.  Reusing the same opcode and/or structure for something
fundamentally different creates major confusion.  Don't do it.

Ok. If you are open to take new opcode/struct route, that is all we
require to pair with big-sqe and have this sorted. How about this -

+/* same as nvme_passthru_cmd64 but expecting result field to be pointer */
+struct nvme_passthru_cmd64_ind {
+       __u8    opcode;
+       __u8    flags;
+       __u16   rsvd1;
+       __u32   nsid;
+       __u32   cdw2;
+       __u32   cdw3;
+       __u64   metadata;
+       __u64   addr;
+       __u32   metadata_len;
+       union {
+               __u32   data_len; /* for non-vectored io */
+               __u32   vec_cnt; /* for vectored io */
+       };
+       __u32   cdw10;
+       __u32   cdw11;
+       __u32   cdw12;
+       __u32   cdw13;
+       __u32   cdw14;
+       __u32   cdw15;
+       __u32   timeout_ms;
+       __u32   rsvd2;
+       __u64   presult; /* pointer to result */
+};
+
#define nvme_admin_cmd nvme_passthru_cmd

+#define NVME_IOCTL_IO64_CMD_IND        _IOWR('N', 0x50, struct nvme_passthru_cmd64_ind)

Not heavy on code-volume too, because only one statement (updating
result) changes and we reuse everything else.

>From all that we discussed, maybe the path forward could be this:
- inline-cmd/big-sqe is useful if paired with big-cqe. Drop big-sqe
for now if we cannot go the big-cqe route.
- use only indirect-cmd as this requires nothing special, just regular
sqe and cqe. We can support all passthru commands with a lot less
code. No new ioctl in nvme, so same semantics. For common commands
(i.e. read/write) we can still avoid updating the result (put_user
cost will go).

Please suggest if we should approach this any differently in v2.

Personally I think larger SQEs and CQEs are the only sensible interface
here.  Everything else just fails like a horrible hack I would not want
to support in NVMe.

So far we have gathered three choices:
(a) big-sqe + new opcode/struct in nvme
(b) big-sqe + big-cqe
(c) regular-sqe + regular-cqe

I can post a RFC on big-cqe work if that is what it takes to evaluate
clearly what path to take. But really, the code is much more compared
to choice (a) and (c). Differentiating one CQE with another does not
seem very maintenance-friendly, particularly in liburing.

For (c), I did not get what part feels like horrible hack.
It is same as how we do sync passthru - read passthru command from
user-space memory, and update result into that on completion.
But yes, (a) seems like the best option to me.





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux