On Tuesday, October 10, 2023 6:40 AM Kanchan Joshi <joshi.k@xxxxxxxxxxx> wrote: > > On Tue, Oct 10, 2023 at 1:16 PM Christoph Hellwig <hch@xxxxxx> wrote: > > > > On Fri, Oct 06, 2023 at 07:17:06PM +0530, Kanchan Joshi wrote: > > > Same issue is possible for extended-lba case also. When user specifies a > > > short unaligned buffer, the kernel makes a copy and uses that for DMA. > > > > I fail to understand the extent LBA case, and also from looking at the > > code mixing it up with validation of the metadata_len seems very > > confusion. Can you try to clearly explain it and maybe split it into a > > separate patch? > > The case is for the single interleaved buffer with both data and > metadata. When the driver sends this buffer to blk_rq_map_user_iov(), > it may make a copy of it. > This kernel buffer will be used for DMA rather than user buffer. If > the user-buffer is short, the kernel buffer is also short. > > Does this explanation help? > I can move the part to a separate patch. > > > > Fixes: 456cba386e94 ("nvme: wire-up uring-cmd support for io-passthru on > char-device") > > > > Is this really io_uring specific? I think we also had the same issue > > before and this should go back to adding metadata support to the > > general passthrough ioctl? > > Yes, not io_uring specific. > Just that I was not sure on (i) whether to go back that far in > history, and (ii) what patch to tag. > > > > +static inline bool nvme_nlb_in_cdw12(u8 opcode) > > > +{ > > > + switch (opcode) { > > > + case nvme_cmd_read: > > > + case nvme_cmd_write: > > > + case nvme_cmd_compare: > > > + case nvme_cmd_zone_append: > > > + return true; > > > + } > > > + return false; > > > +} > > > > Nitpick: I find it nicer to read to have a switch that catches > > everything with a default statement instead of falling out of it > > for checks like this. It's not making any different in practice > > but just reads a little nicer. > > Sure, I can change it. > What if the ns used the KV CS? Store and retrieve are the same op codes as nvme_cmd_write and nvme_cmd_read. > > > + /* Exclude commands that do not have nlb in cdw12 */ > > > + if (!nvme_nlb_in_cdw12(c->common.opcode)) > > > + return true; > > > > So we can still get exactly the same corruption for all commands that > > are not known? That's not a very safe way to deal with the issue.. > > Given the way things are in NVMe, I do not find a better way. > Maybe another day for commands that do (or can do) things very > differently for nlb and PI representation. > > > > + control = upper_16_bits(le32_to_cpu(c->common.cdw12)); > > > + /* Exclude when meta transfer from/to host is not done */ > > > + if (control & NVME_RW_PRINFO_PRACT && ns->ms == ns->pi_size) > > > + return true; > > > + > > > + nlb = lower_16_bits(le32_to_cpu(c->common.cdw12)); > > > > I'd use the rw field of the union and the typed control and length > > fields to clean this up a bit. > > > > > if (bdev && meta_buffer && meta_len) { > > > + if (!nvme_validate_passthru_meta(ns, nvme_req(req)->cmd, > > > + meta_len, bufflen)) { > > > + ret = -EINVAL; > > > + goto out_unmap; > > > + } > > > + > > > meta = nvme_add_user_metadata(req, meta_buffer, meta_len, > > > > I'd move the check into nvme_add_user_metadata to keep it out of the > > hot path. > > > > FYI: here is what I'd do for the external metadata only case: > > Since you have improvised comments too, I may just use this for the > next iteration.