Re: [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 4, 2019 at 2:44 PM Javier González <javier@xxxxxxxxxxx> wrote:
>
>
> > On 4 Mar 2019, at 14.25, Matias Bjørling <mb@xxxxxxxxxxx> wrote:
> >
> > On 3/4/19 2:19 PM, Javier González wrote:
> >>> On 4 Mar 2019, at 13.22, Hans Holmberg <hans@xxxxxxxxxxxxx> wrote:
> >>>
> >>> On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@xxxxxxxxxxx> wrote:
> >>>>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@xxxxxxxxxxxxx> wrote:
> >>>>>
> >>>>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@xxxxxxxxxxx> wrote:
> >>>>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@xxxxxxxxx> wrote:
> >>>>>>>
> >>>>>>> Current lightnvm and pblk implementation does not care
> >>>>>>> about NVMe max data transfer size, which can be smaller
> >>>>>>> than 64*K=256K. This patch fixes issues related to that.
> >>>>>
> >>>>> Could you describe *what* issues you are fixing?
> >>>>>
> >>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@xxxxxxxxx>
> >>>>>>> ---
> >>>>>>> drivers/lightnvm/core.c      | 9 +++++++--
> >>>>>>> drivers/nvme/host/lightnvm.c | 1 +
> >>>>>>> include/linux/lightnvm.h     | 1 +
> >>>>>>> 3 files changed, 9 insertions(+), 2 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
> >>>>>>> index 5f82036fe322..c01f83b8fbaf 100644
> >>>>>>> --- a/drivers/lightnvm/core.c
> >>>>>>> +++ b/drivers/lightnvm/core.c
> >>>>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
> >>>>>>>     struct nvm_target *t;
> >>>>>>>     struct nvm_tgt_dev *tgt_dev;
> >>>>>>>     void *targetdata;
> >>>>>>> +     unsigned int mdts;
> >>>>>>>     int ret;
> >>>>>>>
> >>>>>>>     switch (create->conf.type) {
> >>>>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
> >>>>>>>     tdisk->private_data = targetdata;
> >>>>>>>     tqueue->queuedata = targetdata;
> >>>>>>>
> >>>>>>> -     blk_queue_max_hw_sectors(tqueue,
> >>>>>>> -                     (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
> >>>>>>> +     mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
> >>>>>>> +     if (dev->geo.mdts) {
> >>>>>>> +             mdts = min_t(u32, dev->geo.mdts,
> >>>>>>> +                             (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
> >>>>>>> +     }
> >>>>>>> +     blk_queue_max_hw_sectors(tqueue, mdts);
> >>>>>>>
> >>>>>>>     set_capacity(tdisk, tt->capacity(targetdata));
> >>>>>>>     add_disk(tdisk);
> >>>>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> >>>>>>> index b759c25c89c8..b88a39a3cbd1 100644
> >>>>>>> --- a/drivers/nvme/host/lightnvm.c
> >>>>>>> +++ b/drivers/nvme/host/lightnvm.c
> >>>>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
> >>>>>>>     geo->csecs = 1 << ns->lba_shift;
> >>>>>>>     geo->sos = ns->ms;
> >>>>>>>     geo->ext = ns->ext;
> >>>>>>> +     geo->mdts = ns->ctrl->max_hw_sectors;
> >>>>>>>
> >>>>>>>     dev->q = q;
> >>>>>>>     memcpy(dev->name, disk_name, DISK_NAME_LEN);
> >>>>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
> >>>>>>> index 5d865a5d5cdc..d3b02708e5f0 100644
> >>>>>>> --- a/include/linux/lightnvm.h
> >>>>>>> +++ b/include/linux/lightnvm.h
> >>>>>>> @@ -358,6 +358,7 @@ struct nvm_geo {
> >>>>>>>     u16     csecs;          /* sector size */
> >>>>>>>     u16     sos;            /* out-of-band area size */
> >>>>>>>     bool    ext;            /* metadata in extended data buffer */
> >>>>>>> +     u32     mdts;           /* Max data transfer size*/
> >>>>>>>
> >>>>>>>     /* device write constrains */
> >>>>>>>     u32     ws_min;         /* minimum write size */
> >>>>>>> --
> >>>>>>> 2.17.1
> >>>>>>
> >>>>>> I see where you are going with this and I partially agree, but none of
> >>>>>> the OCSSD specs define a way to define this parameter. Thus, adding this
> >>>>>> behavior taken from NVMe in Linux can break current implementations. Is
> >>>>>> this a real life problem for you? Or this is just for NVMe “correctness”?
> >>>>>>
> >>>>>> Javier
> >>>>>
> >>>>> Hmm.Looking into the 2.0 spec what it says about vector reads:
> >>>>>
> >>>>> (figure 28):"The number of Logical Blocks (NLB): This field indicates
> >>>>> the number of logical blocks to be read. This is a 0’s based value.
> >>>>> Maximum of 64 LBAs is supported."
> >>>>>
> >>>>> You got the max limit covered, and the spec  does not say anything
> >>>>> about the minimum number of LBAs to support.
> >>>>>
> >>>>> Matias: any thoughts on this?
> >>>>>
> >>>>> Javier: How would this patch break current implementations?
> >>>>
> >>>> Say an OCSSD controller that sets mdts to a value under 64 or does not
> >>>> set it at all (maybe garbage). Think you can get to one pretty quickly...
> >>>
> >>> So we cant make use of a perfectly good, standardized, parameter
> >>> because some hypothetical non-compliant device out there might not
> >>> provide a sane value?
> >> The OCSSD standard has never used NVMe parameters, so there is no
> >> compliant / non-compliant. In fact, until we changed OCSSD 2.0 to
> >> get the sector and OOB sizes from the standard identify
> >> command, we used to have them in the geometry.
> >
> > What the hell? Yes it has. The whole OCSSD spec is dependent on the
> > NVMe spec. It is using many commands from the NVMe specification,
> > which is not defined in the OCSSD specification.
> >
>
> First, lower the tone.
>
> Second, no, it has not and never has, starting with all the write
> constrains, continuing with the vector commands, etc. You cannot choose
> what you want to be compliant with and what you do not. OCSSD uses the
> NVMe protocol but it is self sufficient with its geometry for all the
> read / write / erase paths - it even depends on different PCIe class
> codes to be identified… To do this in the way the rest of the spec is
> defined, we either add a filed to the geometry or explicitly mention
> that MDTS is used, as we do with the sector and metadata sizes.
>
> Third, as a maintainer of this subsystem you should care about devices
> in the field that might break due to such a change (supported by the
> company you work for or not) - even if you can argue whether the change
> is compliant or not.
>
> And Hans, as a representative of a company that has such devices out
> there, you should care too.

If you worry about me doing my job, you need not to.
I test. So far I have not found any regressions in this patchset.

Please keep your open source hat on Javier.

>
> What if we add a quirk in the feature bits for this so that newer
> devices can implement this and older devices can still function?
>
> > The MDTS field should be respected in all case, similarly to how the
> > block layer respects it. Since the lightnvm subsystem are hooking in
> > on the side, this also be honoured by pblk (or the lightnvm subsystem
> > should fix it up)
> >
>
> This said, pblk does not care which value you give, it uses what the
> subsystem tells it - this is not arguing for this change not to be
> implemented.
>
> The only thing we should care about if implementing this is removing the
> constant defining 64 ppas and making allocations dynamic in the partial
> read and GC paths.
>
> Javier




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux