> From: Javier Gonzalez [mailto:javier@xxxxxxxxxxxx] > Sent: Tuesday, June 5, 2018 9:12 AM > To: Dziegielewski, Marcin <marcin.dziegielewski@xxxxxxxxx> > Cc: Matias Bjørling <mb@xxxxxxxxxxx>; Jens Axboe <axboe@xxxxxx>; linux- > block@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Konopko, Igor J > <igor.j.konopko@xxxxxxxxx> > Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits > equals to 0 > > > On 4 Jun 2018, at 19.17, Dziegielewski, Marcin > <marcin.dziegielewski@xxxxxxxxx> wrote: > > > >> From: Javier Gonzalez [mailto:javier@xxxxxxxxxxxx] > >> Sent: Monday, June 4, 2018 1:16 PM > >> To: Dziegielewski, Marcin <marcin.dziegielewski@xxxxxxxxx> > >> Cc: Matias Bjørling <mb@xxxxxxxxxxx>; Jens Axboe <axboe@xxxxxx>; > >> linux- block@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Konopko, > >> Igor J <igor.j.konopko@xxxxxxxxx> > >> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when > >> mw_cunits equals to 0 > >> > >> > >>> On 4 Jun 2018, at 13.11, Dziegielewski, Marcin > >> <marcin.dziegielewski@xxxxxxxxx> wrote: > >>>> -----Original Message----- > >>>> From: Javier Gonzalez [mailto:javier@xxxxxxxxxxxx] > >>>> Sent: Monday, June 4, 2018 12:22 PM > >>>> To: Dziegielewski, Marcin <marcin.dziegielewski@xxxxxxxxx> > >>>> Cc: Matias Bjørling <mb@xxxxxxxxxxx>; Jens Axboe <axboe@xxxxxx>; > >>>> linux- block@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > >>>> Konopko, Igor J <igor.j.konopko@xxxxxxxxx> > >>>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when > >>>> mw_cunits equals to 0 > >>>> > >>>>> On 4 Jun 2018, at 12.09, Dziegielewski, Marcin > >>>> <marcin.dziegielewski@xxxxxxxxx> wrote: > >>>>> Frist of all I want to say sorry for late response - I was on holiday. > >>>>> > >>>>>> From: Javier Gonzalez [mailto:javier@xxxxxxxxxxxx] > >>>>>> Sent: Monday, May 28, 2018 1:03 PM > >>>>>> To: Matias Bjørling <mb@xxxxxxxxxxx> > >>>>>> Cc: Jens Axboe <axboe@xxxxxx>; linux-block@xxxxxxxxxxxxxxx; > >>>>>> linux- kernel@xxxxxxxxxxxxxxx; Dziegielewski, Marcin > >>>>>> <marcin.dziegielewski@xxxxxxxxx>; Konopko, Igor J > >>>>>> <igor.j.konopko@xxxxxxxxx> > >>>>>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when > >>>>>> mw_cunits equals to 0 > >>>>>> > >>>>>>> On 28 May 2018, at 10.58, Matias Bjørling <mb@xxxxxxxxxxx> wrote: > >>>>>>> > >>>>>>> From: Marcin Dziegielewski <marcin.dziegielewski@xxxxxxxxx> > >>>>>>> > >>>>>>> Some devices can expose mw_cunits equal to 0, it can cause > >>>>>>> creation of too small write buffer and cause performance to drop > >>>>>>> on write workloads. > >>>>>>> > >>>>>>> To handle that, we use the default value for MLC and beacause it > >>>>>>> covers both 1.2 and 2.0 OC specification, setting up mw_cunits > >>>>>>> in > >>>>>>> nvme_nvm_setup_12 function isn't longer necessary. > >>>>>>> > >>>>>>> Signed-off-by: Marcin Dziegielewski > >>>>>>> <marcin.dziegielewski@xxxxxxxxx> > >>>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@xxxxxxxxx> > >>>>>>> Signed-off-by: Matias Bjørling <mb@xxxxxxxxxxx> > >>>>>>> --- > >>>>>>> drivers/lightnvm/pblk-init.c | 10 +++++++++- > >>>>>>> drivers/nvme/host/lightnvm.c | 1 - > >>>>>>> 2 files changed, 9 insertions(+), 2 deletions(-) > >>>>>>> > >>>>>>> diff --git a/drivers/lightnvm/pblk-init.c > >>>>>>> b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b > >>>>>>> 100644 > >>>>>>> --- a/drivers/lightnvm/pblk-init.c > >>>>>>> +++ b/drivers/lightnvm/pblk-init.c > >>>>>>> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk) > >>>>>>> atomic64_set(&pblk->nr_flush, 0); > >>>>>>> pblk->nr_flush_rst = 0; > >>>>>>> > >>>>>>> - pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns; > >>>>>>> + if (geo->mw_cunits) { > >>>>>>> + pblk->pgs_in_buffer = geo->mw_cunits * geo- > >>> all_luns; > >>>>>>> + } else { > >>>>>>> + pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo- > >>> all_luns; > >>>>>>> + /* > >>>>>>> + * Some devices can expose mw_cunits equal to 0, so > >> let's > >>>>>> use > >>>>>>> + * here default safe value for MLC. > >>>>>>> + */ > >>>>>>> + } > >>>>>>> > >>>>>>> pblk->min_write_pgs = geo->ws_opt * (geo->csecs / > PAGE_SIZE); > >>>>>>> max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff > >>>>>>> --git a/drivers/nvme/host/lightnvm.c > >>>>>>> b/drivers/nvme/host/lightnvm.c index > >>>>>>> 41279da799ed..c747792da915 100644 > >>>>>>> --- a/drivers/nvme/host/lightnvm.c > >>>>>>> +++ b/drivers/nvme/host/lightnvm.c > >>>>>>> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct > >>>>>> nvme_nvm_id12 > >>>>>>> *id, > >>>>>>> > >>>>>>> geo->ws_min = sec_per_pg; > >>>>>>> geo->ws_opt = sec_per_pg; > >>>>>>> - geo->mw_cunits = geo->ws_opt << 3; /* default to MLC > >> safe values > >>>>>> */ > >>>>>>> /* Do not impose values for maximum number of open blocks as it > is > >>>>>>> * unspecified in 1.2. Users of 1.2 must be aware of this and > >>>>>>> eventually > >>>>>>> -- > >>>>>>> 2.11.0 > >>>>>> > >>>>>> By doing this, 1.2 future users (beyond pblk), will fail to have > >>>>>> a valid mw_cunits value. It's ok to deal with the 0 case in pblk, > >>>>>> but I believe that we should have the default value for 1.2 either > way. > >>>>> > >>>>> I'm not sure. From my understanding, setting of default value was > >>>>> workaround for pblk case, am I right ?. > >>>> > >>>> The default value covers the MLC case directly at the lightnvm > >>>> layer, as opposed to doing it directly in pblk. Since pblk is the > >>>> only user now, you can argue that all changes in the lightnvm layer > >>>> are to solve pblk issues, but the idea is that the geometry should be > generic. > >>>> > >>>>> In my opinion any user of 1.2 > >>>>> spec should be aware that there is not mw_cunit value. From my > >>>>> point of view, leaving here 0 (and decision what do with it to > >>>>> lightnvm > >>>>> user) is more safer way, but maybe I'm wrong. I believe that it is > >>>>> topic to wider discussion with maintainers. > >>>> > >>>> 1.2 and 2.0 have different geometries, but when we designed the > >>>> common nvm_geo structure, the idea was to abstract both specs and > >>>> allow the upper layers to use the geometry transparently. > >>>> > >>>> Specifically in pblk, I would prefer to keep it in such a way that > >>>> we don't need to media specific policies (e.g., set default values > >>>> for MLC memories), as a general design principle. We already do > >>>> some geometry version checks to avoid dereferencing unnecessary > >>>> pointers on the fast path, which I would eventually like to remove. > >>> > >>> Ok, now I understand your point of view and agree with that, I will > >>> prepare second version of this patch without this change. > >> > >> Sounds good. > >> > >>> Thanks for > >>> the clarification. > >> > >> Sure :) > >> > >>>>>> A more generic way of doing this would be to have a default value > >>>>>> for > >>>>>> 2.0 too, in case mw_cunits is reported as 0. > >>>>> > >>>>> Since 0 is correct value and users can make different decisions > >>>>> based on it, I think we shouldn't overwrite it by default value. > >>>>> Is it make sense? > >>>> > >>>> Here I meant at a pblk level - I should have specified it. At the > >>>> geometry level, we should not change it. > >>>> > >>>> The case I am thinking is if mw_cuints repoints 0, but ws_min > 0. > >>>> In this case, we still need a host side buffer to serve < ws_min > >>>> I/Os, even though the device does not require the buffer to guarantee > reads. > >>> > >>> Oh, ok now we are on the same page. In this patch I was trying to > >>> address such case. Do you have other idea how to do it or here are > >>> you thinking only on value of default variable? > >> > >> If doing this, I guess that something in the line of what you did > >> with increasing the size of the write buffer via a module parameter. > >> For example, checking if the size of the write buffer based on > >> mw_cuints is enough to cover ws_min, which normally would only be an > >> issue when mw_cuints == 0 or when the number of PUs used for the pblk > >> instance is very small and mw_cuints < nr_luns * ws_min. > > > > > > I see here two cases: > > - when mw_cunits > 0 buffer size should have number of entries at > > least max(mw_cunits, ws_min) * nr_luns and here we are taking care of > > both cases mw_cunits > ws_min and mw_cunits < ws_min. > > - when mw_cunit == 0 buffer size should have number of entries at > > least ws_min * nr _luns and we can use the same puseudocode as above. > > > > Agree. > > > Do you see any other case? Could you clarify second case mentioned by > > you or maybe did you mean opposite case? If yes, I believe that above > > pseudo code will handle such case too. > > > > Yes, it is the same case. > > One thing to consider is whether the buffer should at least be ws_opt * > nr_luns for performance reasons. Since the write thread will always try to > send ws_opt, in the case that ws_opt > ws_min, then a buffer size of > ws_min * nr_luns will not make use of the whole parallelism exposed by the > device. > Agree. After I had sent last email I also thought that should be ws_opt instead of ws_min. > Therefore, I would probably go for ws_opt * nr_luns as the default value > when mw_cuints * nr_luns < ws_opt * nr_luns (which covers mw_cuints == > 0), and then keep ws_min * nr_luns as the minimum requirement when > setting the buffer size manually. Sounds good. > > Does this cover your use case? > Yes, I think that cover all cases related to this topic. I will prepare patch in the afternoon. Many thanks for perfect cooperation! Marcin > >>>>>> Javier > >>>>> > >>>>> Thanks, > >>>>> Marcin > >>>> > >>>> Javier > >>> Thanks, > >>> Marcin > > Thanks!, > > Marcin