> On 4 Jun 2018, at 13.11, Dziegielewski, Marcin <marcin.dziegielewski@xxxxxxxxx> wrote: > >> -----Original Message----- >> From: Javier Gonzalez [mailto:javier@xxxxxxxxxxxx] >> Sent: Monday, June 4, 2018 12:22 PM >> To: Dziegielewski, Marcin <marcin.dziegielewski@xxxxxxxxx> >> Cc: Matias Bjørling <mb@xxxxxxxxxxx>; Jens Axboe <axboe@xxxxxx>; linux- >> block@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Konopko, Igor J >> <igor.j.konopko@xxxxxxxxx> >> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits >> equals to 0 >> >>> On 4 Jun 2018, at 12.09, Dziegielewski, Marcin >> <marcin.dziegielewski@xxxxxxxxx> wrote: >>> Frist of all I want to say sorry for late response - I was on holiday. >>> >>>> From: Javier Gonzalez [mailto:javier@xxxxxxxxxxxx] >>>> Sent: Monday, May 28, 2018 1:03 PM >>>> To: Matias Bjørling <mb@xxxxxxxxxxx> >>>> Cc: Jens Axboe <axboe@xxxxxx>; linux-block@xxxxxxxxxxxxxxx; linux- >>>> kernel@xxxxxxxxxxxxxxx; Dziegielewski, Marcin >>>> <marcin.dziegielewski@xxxxxxxxx>; Konopko, Igor J >>>> <igor.j.konopko@xxxxxxxxx> >>>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when >>>> mw_cunits equals to 0 >>>> >>>>> On 28 May 2018, at 10.58, Matias Bjørling <mb@xxxxxxxxxxx> wrote: >>>>> >>>>> From: Marcin Dziegielewski <marcin.dziegielewski@xxxxxxxxx> >>>>> >>>>> Some devices can expose mw_cunits equal to 0, it can cause creation >>>>> of too small write buffer and cause performance to drop on write >>>>> workloads. >>>>> >>>>> To handle that, we use the default value for MLC and beacause it >>>>> covers both 1.2 and 2.0 OC specification, setting up mw_cunits in >>>>> nvme_nvm_setup_12 function isn't longer necessary. >>>>> >>>>> Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@xxxxxxxxx> >>>>> Signed-off-by: Igor Konopko <igor.j.konopko@xxxxxxxxx> >>>>> Signed-off-by: Matias Bjørling <mb@xxxxxxxxxxx> >>>>> --- >>>>> drivers/lightnvm/pblk-init.c | 10 +++++++++- >>>>> drivers/nvme/host/lightnvm.c | 1 - >>>>> 2 files changed, 9 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/lightnvm/pblk-init.c >>>>> b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b >>>>> 100644 >>>>> --- a/drivers/lightnvm/pblk-init.c >>>>> +++ b/drivers/lightnvm/pblk-init.c >>>>> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk) >>>>> atomic64_set(&pblk->nr_flush, 0); >>>>> pblk->nr_flush_rst = 0; >>>>> >>>>> - pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns; >>>>> + if (geo->mw_cunits) { >>>>> + pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns; >>>>> + } else { >>>>> + pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo->all_luns; >>>>> + /* >>>>> + * Some devices can expose mw_cunits equal to 0, so let's >>>> use >>>>> + * here default safe value for MLC. >>>>> + */ >>>>> + } >>>>> >>>>> pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE); >>>>> max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff --git >>>>> a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index >>>>> 41279da799ed..c747792da915 100644 >>>>> --- a/drivers/nvme/host/lightnvm.c >>>>> +++ b/drivers/nvme/host/lightnvm.c >>>>> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct >>>> nvme_nvm_id12 >>>>> *id, >>>>> >>>>> geo->ws_min = sec_per_pg; >>>>> geo->ws_opt = sec_per_pg; >>>>> - geo->mw_cunits = geo->ws_opt << 3; /* default to MLC safe values >>>> */ >>>>> /* Do not impose values for maximum number of open blocks as it is >>>>> * unspecified in 1.2. Users of 1.2 must be aware of this and >>>>> eventually >>>>> -- >>>>> 2.11.0 >>>> >>>> By doing this, 1.2 future users (beyond pblk), will fail to have a >>>> valid mw_cunits value. It's ok to deal with the 0 case in pblk, but I >>>> believe that we should have the default value for 1.2 either way. >>> >>> I'm not sure. From my understanding, setting of default value was >>> workaround for pblk case, am I right ?. >> >> The default value covers the MLC case directly at the lightnvm layer, as >> opposed to doing it directly in pblk. Since pblk is the only user now, you can >> argue that all changes in the lightnvm layer are to solve pblk issues, but the >> idea is that the geometry should be generic. >> >>> In my opinion any user of 1.2 >>> spec should be aware that there is not mw_cunit value. From my point >>> of view, leaving here 0 (and decision what do with it to lightnvm >>> user) is more safer way, but maybe I'm wrong. I believe that it is >>> topic to wider discussion with maintainers. >> >> 1.2 and 2.0 have different geometries, but when we designed the common >> nvm_geo structure, the idea was to abstract both specs and allow the upper >> layers to use the geometry transparently. >> >> Specifically in pblk, I would prefer to keep it in such a way that we don't need >> to media specific policies (e.g., set default values for MLC memories), as a >> general design principle. We already do some geometry version checks to >> avoid dereferencing unnecessary pointers on the fast path, which I would >> eventually like to remove. > > Ok, now I understand your point of view and agree with that, I will > prepare second version of this patch without this change. Sounds good. > Thanks for > the clarification. > Sure :) >>>> A more generic way of doing this would be to have a default value for >>>> 2.0 too, in case mw_cunits is reported as 0. >>> >>> Since 0 is correct value and users can make different decisions based >>> on it, I think we shouldn't overwrite it by default value. Is it make >>> sense? >> >> Here I meant at a pblk level - I should have specified it. At the geometry >> level, we should not change it. >> >> The case I am thinking is if mw_cuints repoints 0, but ws_min > 0. In this case, >> we still need a host side buffer to serve < ws_min I/Os, even though the >> device does not require the buffer to guarantee reads. > > Oh, ok now we are on the same page. In this patch I was trying to > address such case. Do you have other idea how to do it or here are you > thinking only on value of default variable? If doing this, I guess that something in the line of what you did with increasing the size of the write buffer via a module parameter. For example, checking if the size of the write buffer based on mw_cuints is enough to cover ws_min, which normally would only be an issue when mw_cuints == 0 or when the number of PUs used for the pblk instance is very small and mw_cuints < nr_luns * ws_min. > >>>> Javier >>> >>> Thanks, >>> Marcin >> >> Javier > Thanks, > Marcin