Re: [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/4/19 2:44 PM, Javier González wrote:

On 4 Mar 2019, at 14.25, Matias Bjørling <mb@xxxxxxxxxxx> wrote:

On 3/4/19 2:19 PM, Javier González wrote:
On 4 Mar 2019, at 13.22, Hans Holmberg <hans@xxxxxxxxxxxxx> wrote:

On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@xxxxxxxxxxx> wrote:
On 4 Mar 2019, at 12.30, Hans Holmberg <hans@xxxxxxxxxxxxx> wrote:

On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@xxxxxxxxxxx> wrote:
On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@xxxxxxxxx> wrote:

Current lightnvm and pblk implementation does not care
about NVMe max data transfer size, which can be smaller
than 64*K=256K. This patch fixes issues related to that.

Could you describe *what* issues you are fixing?

Signed-off-by: Igor Konopko <igor.j.konopko@xxxxxxxxx>
---
drivers/lightnvm/core.c      | 9 +++++++--
drivers/nvme/host/lightnvm.c | 1 +
include/linux/lightnvm.h     | 1 +
3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index 5f82036fe322..c01f83b8fbaf 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
     struct nvm_target *t;
     struct nvm_tgt_dev *tgt_dev;
     void *targetdata;
+     unsigned int mdts;
     int ret;

     switch (create->conf.type) {
@@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
     tdisk->private_data = targetdata;
     tqueue->queuedata = targetdata;

-     blk_queue_max_hw_sectors(tqueue,
-                     (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
+     mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
+     if (dev->geo.mdts) {
+             mdts = min_t(u32, dev->geo.mdts,
+                             (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
+     }
+     blk_queue_max_hw_sectors(tqueue, mdts);

     set_capacity(tdisk, tt->capacity(targetdata));
     add_disk(tdisk);
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index b759c25c89c8..b88a39a3cbd1 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
     geo->csecs = 1 << ns->lba_shift;
     geo->sos = ns->ms;
     geo->ext = ns->ext;
+     geo->mdts = ns->ctrl->max_hw_sectors;

     dev->q = q;
     memcpy(dev->name, disk_name, DISK_NAME_LEN);
diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
index 5d865a5d5cdc..d3b02708e5f0 100644
--- a/include/linux/lightnvm.h
+++ b/include/linux/lightnvm.h
@@ -358,6 +358,7 @@ struct nvm_geo {
     u16     csecs;          /* sector size */
     u16     sos;            /* out-of-band area size */
     bool    ext;            /* metadata in extended data buffer */
+     u32     mdts;           /* Max data transfer size*/

     /* device write constrains */
     u32     ws_min;         /* minimum write size */
--
2.17.1

I see where you are going with this and I partially agree, but none of
the OCSSD specs define a way to define this parameter. Thus, adding this
behavior taken from NVMe in Linux can break current implementations. Is
this a real life problem for you? Or this is just for NVMe “correctness”?

Javier

Hmm.Looking into the 2.0 spec what it says about vector reads:

(figure 28):"The number of Logical Blocks (NLB): This field indicates
the number of logical blocks to be read. This is a 0’s based value.
Maximum of 64 LBAs is supported."

You got the max limit covered, and the spec  does not say anything
about the minimum number of LBAs to support.

Matias: any thoughts on this?

Javier: How would this patch break current implementations?

Say an OCSSD controller that sets mdts to a value under 64 or does not
set it at all (maybe garbage). Think you can get to one pretty quickly...

So we cant make use of a perfectly good, standardized, parameter
because some hypothetical non-compliant device out there might not
provide a sane value?
The OCSSD standard has never used NVMe parameters, so there is no
compliant / non-compliant. In fact, until we changed OCSSD 2.0 to
get the sector and OOB sizes from the standard identify
command, we used to have them in the geometry.

What the hell? Yes it has. The whole OCSSD spec is dependent on the
NVMe spec. It is using many commands from the NVMe specification,
which is not defined in the OCSSD specification.


First, lower the tone. >
Second, no, it has not and never has, starting with all the write
constrains, continuing with the vector commands, etc.  > You cannot choose
what you want to be compliant with and what you do not. OCSSD uses the
NVMe protocol but it is self sufficient with its geometry for all the
read / write / erase paths - it even depends on different PCIe class
codes to be identified…

No. It does not.

To do this in the way the rest of the spec is
defined, we either add a filed to the geometry or explicitly mention
that MDTS is used, as we do with the sector and metadata sizes.

Third, as a maintainer of this subsystem you should care about devices
in the field that might break due to such a change (supported by the
company you work for or not) - even if you can argue whether the change
is compliant or not.

Same as Hans. If you worry about me doing my job, you need not to.


And Hans, as a representative of a company that has such devices out
there, you should care too.

What if we add a quirk in the feature bits for this so that newer
devices can implement this and older devices can still function?

The MDTS field should be respected in all case, similarly to how the
block layer respects it. Since the lightnvm subsystem are hooking in
on the side, this also be honoured by pblk (or the lightnvm subsystem
should fix it up)


This said, pblk does not care which value you give, it uses what the
subsystem tells it - this is not arguing for this change not to be
implemented.

The only thing we should care about if implementing this is removing the
constant defining 64 ppas and making allocations dynamic in the partial
read and GC paths.

Javier





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux