On Mon, Mar 4, 2019 at 2:04 PM Igor Konopko <igor.j.konopko@xxxxxxxxx> wrote: > > > > On 04.03.2019 13:22, Hans Holmberg wrote: > > On Mon, Mar 4, 2019 at 12:44 PM Javier González <javier@xxxxxxxxxxx> wrote: > >> > >> > >> > >>> On 4 Mar 2019, at 12.30, Hans Holmberg <hans@xxxxxxxxxxxxx> wrote: > >>> > >>> On Mon, Mar 4, 2019 at 10:05 AM Javier González <javier@xxxxxxxxxxx> wrote: > >>>>> On 27 Feb 2019, at 18.14, Igor Konopko <igor.j.konopko@xxxxxxxxx> wrote: > >>>>> > >>>>> Current lightnvm and pblk implementation does not care > >>>>> about NVMe max data transfer size, which can be smaller > >>>>> than 64*K=256K. This patch fixes issues related to that. > >>> > >>> Could you describe *what* issues you are fixing? > > I'm fixing the issue related to controllers which NVMe max data transfer > size is lower that 256K (for example 128K, which happens for existing > NVMe controllers and NVMe spec compliant). Such a controllers are not > able to handle command which contains 64 PPAs, since the the size of > DMAed buffer will be above the capabilities of such a controller. Thanks for the explanation, that would be great to have in the commit message. > > >>> > >>>>> Signed-off-by: Igor Konopko <igor.j.konopko@xxxxxxxxx> > >>>>> --- > >>>>> drivers/lightnvm/core.c | 9 +++++++-- > >>>>> drivers/nvme/host/lightnvm.c | 1 + > >>>>> include/linux/lightnvm.h | 1 + > >>>>> 3 files changed, 9 insertions(+), 2 deletions(-) > >>>>> > >>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c > >>>>> index 5f82036fe322..c01f83b8fbaf 100644 > >>>>> --- a/drivers/lightnvm/core.c > >>>>> +++ b/drivers/lightnvm/core.c > >>>>> @@ -325,6 +325,7 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) > >>>>> struct nvm_target *t; > >>>>> struct nvm_tgt_dev *tgt_dev; > >>>>> void *targetdata; > >>>>> + unsigned int mdts; > >>>>> int ret; > >>>>> > >>>>> switch (create->conf.type) { > >>>>> @@ -412,8 +413,12 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct nvm_ioctl_create *create) > >>>>> tdisk->private_data = targetdata; > >>>>> tqueue->queuedata = targetdata; > >>>>> > >>>>> - blk_queue_max_hw_sectors(tqueue, > >>>>> - (dev->geo.csecs >> 9) * NVM_MAX_VLBA); > >>>>> + mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA; > >>>>> + if (dev->geo.mdts) { > >>>>> + mdts = min_t(u32, dev->geo.mdts, > >>>>> + (dev->geo.csecs >> 9) * NVM_MAX_VLBA); > >>>>> + } > >>>>> + blk_queue_max_hw_sectors(tqueue, mdts); > >>>>> > >>>>> set_capacity(tdisk, tt->capacity(targetdata)); > >>>>> add_disk(tdisk); > >>>>> diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c > >>>>> index b759c25c89c8..b88a39a3cbd1 100644 > >>>>> --- a/drivers/nvme/host/lightnvm.c > >>>>> +++ b/drivers/nvme/host/lightnvm.c > >>>>> @@ -991,6 +991,7 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node) > >>>>> geo->csecs = 1 << ns->lba_shift; > >>>>> geo->sos = ns->ms; > >>>>> geo->ext = ns->ext; > >>>>> + geo->mdts = ns->ctrl->max_hw_sectors; > >>>>> > >>>>> dev->q = q; > >>>>> memcpy(dev->name, disk_name, DISK_NAME_LEN); > >>>>> diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h > >>>>> index 5d865a5d5cdc..d3b02708e5f0 100644 > >>>>> --- a/include/linux/lightnvm.h > >>>>> +++ b/include/linux/lightnvm.h > >>>>> @@ -358,6 +358,7 @@ struct nvm_geo { > >>>>> u16 csecs; /* sector size */ > >>>>> u16 sos; /* out-of-band area size */ > >>>>> bool ext; /* metadata in extended data buffer */ > >>>>> + u32 mdts; /* Max data transfer size*/ > >>>>> > >>>>> /* device write constrains */ > >>>>> u32 ws_min; /* minimum write size */ > >>>>> -- > >>>>> 2.17.1 > >>>> > >>>> I see where you are going with this and I partially agree, but none of > >>>> the OCSSD specs define a way to define this parameter. Thus, adding this > >>>> behavior taken from NVMe in Linux can break current implementations. Is > >>>> this a real life problem for you? Or this is just for NVMe “correctness”? > >>>> > >>>> Javier > >>> > >>> Hmm.Looking into the 2.0 spec what it says about vector reads: > >>> > >>> (figure 28):"The number of Logical Blocks (NLB): This field indicates > >>> the number of logical blocks to be read. This is a 0’s based value. > >>> Maximum of 64 LBAs is supported." > >>> > >>> You got the max limit covered, and the spec does not say anything > >>> about the minimum number of LBAs to support. > >>> > >>> Matias: any thoughts on this? > >>> > >>> Javier: How would this patch break current implementations? > >> > >> Say an OCSSD controller that sets mdts to a value under 64 or does not > >> set it at all (maybe garbage). Think you can get to one pretty quickly... > > > > So we cant make use of a perfectly good, standardized, parameter > > because some hypothetical non-compliant device out there might not > > provide a sane value? > > > >> > >>> > >>> Igor: how does this patch fix the mdts restriction? There are no > >>> checks on i.e. the gc read path that ensures that a lower limit than > >>> NVM_MAX_VLBA is enforced. > >> > >> This is the other part where the implementation breaks. > > > > No, it just does not fix it. > > > > over-and-out, > > Hans > > IO requests issued from both garbage collector and writer thread are > upper limiter by pblk->max_write_pgs variable. This variable is > calculated based on queue_max_hw_sectors() already in pblk. > > User reads on the other hand are splitted by block layer based on > queue_max_hw_sectors() too. Mea culpa, right you are! > > Is there any other path which I'm missing, which will still tries to > issue IOs with higher IO size? Or am I missing sth else in my logic? I missed the adjustments of max_write_pgs in pblk_core_init. We should be good. Thanks, Hans