Re: [PATCH 13/13] lightnvm: Inherit mdts from the parent nvme device

Hans Holmberg <hans@xxxxxxxxxxxxx> · Mon, 4 Mar 2019 14:16:23 +0100

On Mon, Mar 4, 2019 at 2:04 PM Igor Konopko <igor.j.konopko@xxxxxxxxx> wrote:
>
>
>
> On 04.03.2019 13:22, Hans Holmberg wrote:
> > On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@xxxxxxxxxxx> wrote:
> >>
> >>
> >>
> >>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@xxxxxxxxxxxxx> wrote:
> >>>
> >>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@xxxxxxxxxxx> wrote:
> >>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@xxxxxxxxx> wrote:
> >>>>>
> >>>>> Current lightnvm and pblk implementation does not care
> >>>>> about NVMe max data transfer size, which can be smaller
> >>>>> than 64*K=256K. This patch fixes issues related to that.
> >>>
> >>> Could you describe *what* issues you are fixing?
>
> I'm fixing the issue related to controllers which NVMe max data transfer
> size is lower that 256K (for example 128K, which happens for existing
> NVMe controllers and NVMe spec compliant). Such a controllers are not
> able to handle command which contains 64 PPAs, since the the size of
> DMAed buffer will be above the capabilities of such a controller.

Thanks for the explanation, that would be great to have in the commit message.

>
> >>>
> >>>>> Signed-off-by: Igor Konopko <igor.j.konopko@xxxxxxxxx>
> >>>>> ---
> >>>>> drivers/lightnvm/core.c      | 9 +++++++--
> >>>>> drivers/nvme/host/lightnvm.c | 1 +
> >>>>> include/linux/lightnvm.h     | 1 +
> >>>>> 3 files changed, 9 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
> >>>>> index 5f82036fe322..c01f83b8fbaf 100644
> >>>>> --- a/drivers/lightnvm/core.c
> >>>>> +++ b/drivers/lightnvm/core.c
> >>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
> >>>>>       struct nvm_target *t;
> >>>>>       struct nvm_tgt_dev *tgt_dev;
> >>>>>       void *targetdata;
> >>>>> +     unsigned int mdts;
> >>>>>       int ret;
> >>>>>
> >>>>>       switch (create->conf.type) {
> >>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create)
> >>>>>       tdisk->private_data = targetdata;
> >>>>>       tqueue->queuedata = targetdata;
> >>>>>
> >>>>> -     blk_queue_max_hw_sectors(tqueue,
> >>>>> -                     (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
> >>>>> +     mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
> >>>>> +     if (dev->geo.mdts) {
> >>>>> +             mdts = min_t(u32, dev->geo.mdts,
> >>>>> +                             (dev->geo.csecs >> 9) * NVM_MAX_VLBA);
> >>>>> +     }
> >>>>> +     blk_queue_max_hw_sectors(tqueue, mdts);
> >>>>>
> >>>>>       set_capacity(tdisk, tt->capacity(targetdata));
> >>>>>       add_disk(tdisk);
> >>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> >>>>> index b759c25c89c8..b88a39a3cbd1 100644
> >>>>> --- a/drivers/nvme/host/lightnvm.c
> >>>>> +++ b/drivers/nvme/host/lightnvm.c
> >>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
> >>>>>       geo->csecs = 1 << ns->lba_shift;
> >>>>>       geo->sos = ns->ms;
> >>>>>       geo->ext = ns->ext;
> >>>>> +     geo->mdts = ns->ctrl->max_hw_sectors;
> >>>>>
> >>>>>       dev->q = q;
> >>>>>       memcpy(dev->name, disk_name, DISK_NAME_LEN);
> >>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h
> >>>>> index 5d865a5d5cdc..d3b02708e5f0 100644
> >>>>> --- a/include/linux/lightnvm.h
> >>>>> +++ b/include/linux/lightnvm.h
> >>>>> @@ -358,6 +358,7 @@ struct nvm_geo {
> >>>>>       u16     csecs;          /* sector size */
> >>>>>       u16     sos;            /* out-of-band area size */
> >>>>>       bool    ext;            /* metadata in extended data buffer */
> >>>>> +     u32     mdts;           /* Max data transfer size*/
> >>>>>
> >>>>>       /* device write constrains */
> >>>>>       u32     ws_min;         /* minimum write size */
> >>>>> --
> >>>>> 2.17.1
> >>>>
> >>>> I see where you are going with this and I partially agree, but none of
> >>>> the OCSSD specs define a way to define this parameter. Thus, adding this
> >>>> behavior taken from NVMe in Linux can break current implementations. Is
> >>>> this a real life problem for you? Or this is just for NVMe “correctness”?
> >>>>
> >>>> Javier
> >>>
> >>> Hmm.Looking into the 2.0 spec what it says about vector reads:
> >>>
> >>> (figure 28):"The number of Logical Blocks (NLB): This field indicates
> >>> the number of logical blocks to be read. This is a 0’s based value.
> >>> Maximum of 64 LBAs is supported."
> >>>
> >>> You got the max limit covered, and the spec  does not say anything
> >>> about the minimum number of LBAs to support.
> >>>
> >>> Matias: any thoughts on this?
> >>>
> >>> Javier: How would this patch break current implementations?
> >>
> >> Say an OCSSD controller that sets mdts to a value under 64 or does not
> >> set it at all (maybe garbage). Think you can get to one pretty quickly...
> >
> > So we cant make use of a perfectly good, standardized, parameter
> > because some hypothetical non-compliant device out there might not
> > provide a sane value?
> >
> >>
> >>>
> >>> Igor: how does this patch fix the mdts restriction? There are no
> >>> checks on i.e. the gc read path that ensures that a lower limit than
> >>> NVM_MAX_VLBA is enforced.
> >>
> >> This is the other part where the implementation breaks.
> >
> > No, it just does not fix it.
> >
> > over-and-out,
> > Hans
>
> IO requests issued from both garbage collector and writer thread are
> upper limiter by pblk->max_write_pgs variable. This variable is
> calculated based on queue_max_hw_sectors() already in pblk.
>
> User reads on the other hand are splitted by block layer based on
> queue_max_hw_sectors() too.

Mea culpa, right you are!

>
> Is there any other path which I'm missing, which will still tries to
> issue IOs with higher IO size? Or am I missing sth else in my logic?

I missed the adjustments of max_write_pgs in pblk_core_init. We should be good.

Thanks,
Hans