Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On 4 Jun 2018, at 13.11, Dziegielewski, Marcin <marcin.dziegielewski@xxxxxxxxx> wrote:
> 
>> -----Original Message-----
>> From: Javier Gonzalez [mailto:javier@xxxxxxxxxxxx]
>> Sent: Monday, June 4, 2018 12:22 PM
>> To: Dziegielewski, Marcin <marcin.dziegielewski@xxxxxxxxx>
>> Cc: Matias Bjørling <mb@xxxxxxxxxxx>; Jens Axboe <axboe@xxxxxx>; linux-
>> block@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Konopko, Igor J
>> <igor.j.konopko@xxxxxxxxx>
>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits
>> equals to 0
>> 
>>> On 4 Jun 2018, at 12.09, Dziegielewski, Marcin
>> <marcin.dziegielewski@xxxxxxxxx> wrote:
>>> Frist of all I want to say sorry for late response - I was on holiday.
>>> 
>>>> From: Javier Gonzalez [mailto:javier@xxxxxxxxxxxx]
>>>> Sent: Monday, May 28, 2018 1:03 PM
>>>> To: Matias Bjørling <mb@xxxxxxxxxxx>
>>>> Cc: Jens Axboe <axboe@xxxxxx>; linux-block@xxxxxxxxxxxxxxx; linux-
>>>> kernel@xxxxxxxxxxxxxxx; Dziegielewski, Marcin
>>>> <marcin.dziegielewski@xxxxxxxxx>; Konopko, Igor J
>>>> <igor.j.konopko@xxxxxxxxx>
>>>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
>>>> mw_cunits equals to 0
>>>> 
>>>>> On 28 May 2018, at 10.58, Matias Bjørling <mb@xxxxxxxxxxx> wrote:
>>>>> 
>>>>> From: Marcin Dziegielewski <marcin.dziegielewski@xxxxxxxxx>
>>>>> 
>>>>> Some devices can expose mw_cunits equal to 0, it can cause creation
>>>>> of too small write buffer and cause performance to drop on write
>>>>> workloads.
>>>>> 
>>>>> To handle that, we use the default value for MLC and beacause it
>>>>> covers both 1.2 and 2.0 OC specification, setting up mw_cunits in
>>>>> nvme_nvm_setup_12 function isn't longer necessary.
>>>>> 
>>>>> Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@xxxxxxxxx>
>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@xxxxxxxxx>
>>>>> Signed-off-by: Matias Bjørling <mb@xxxxxxxxxxx>
>>>>> ---
>>>>> drivers/lightnvm/pblk-init.c | 10 +++++++++-
>>>>> drivers/nvme/host/lightnvm.c |  1 -
>>>>> 2 files changed, 9 insertions(+), 2 deletions(-)
>>>>> 
>>>>> diff --git a/drivers/lightnvm/pblk-init.c
>>>>> b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b
>>>>> 100644
>>>>> --- a/drivers/lightnvm/pblk-init.c
>>>>> +++ b/drivers/lightnvm/pblk-init.c
>>>>> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
>>>>> 	atomic64_set(&pblk->nr_flush, 0);
>>>>> 	pblk->nr_flush_rst = 0;
>>>>> 
>>>>> -	pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
>>>>> +	if (geo->mw_cunits) {
>>>>> +		pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
>>>>> +	} else {
>>>>> +		pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo->all_luns;
>>>>> +		/*
>>>>> +		 * Some devices can expose mw_cunits equal to 0, so let's
>>>> use
>>>>> +		 * here default safe value for MLC.
>>>>> +		 */
>>>>> +	}
>>>>> 
>>>>> 	pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
>>>>> 	max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff --git
>>>>> a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index
>>>>> 41279da799ed..c747792da915 100644
>>>>> --- a/drivers/nvme/host/lightnvm.c
>>>>> +++ b/drivers/nvme/host/lightnvm.c
>>>>> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct
>>>> nvme_nvm_id12
>>>>> *id,
>>>>> 
>>>>> 	geo->ws_min = sec_per_pg;
>>>>> 	geo->ws_opt = sec_per_pg;
>>>>> -	geo->mw_cunits = geo->ws_opt << 3;	/* default to MLC safe values
>>>> */
>>>>> /* Do not impose values for maximum number of open blocks as it is
>>>>> 	 * unspecified in 1.2. Users of 1.2 must be aware of this and
>>>>> eventually
>>>>> --
>>>>> 2.11.0
>>>> 
>>>> By doing this, 1.2 future users (beyond pblk), will fail to have a
>>>> valid mw_cunits value. It's ok to deal with the 0 case in pblk, but I
>>>> believe that we should have the default value for 1.2 either way.
>>> 
>>> I'm not sure. From my understanding, setting of default value was
>>> workaround for pblk case, am I right ?.
>> 
>> The default value covers the MLC case directly at the lightnvm layer, as
>> opposed to doing it directly in pblk. Since pblk is the only user now, you can
>> argue that all changes in the lightnvm layer are to solve pblk issues, but the
>> idea is that the geometry should be generic.
>> 
>>> In my opinion any user of 1.2
>>> spec should be aware that there is not mw_cunit value. From my point
>>> of view, leaving here 0 (and decision what do with it to lightnvm
>>> user) is more safer way, but maybe I'm wrong. I believe that it is
>>> topic to wider discussion with maintainers.
>> 
>> 1.2 and 2.0 have different geometries, but when we designed the common
>> nvm_geo structure, the idea was to abstract both specs and allow the upper
>> layers to use the geometry transparently.
>> 
>> Specifically in pblk, I would prefer to keep it in such a way that we don't need
>> to media specific policies (e.g., set default values for MLC memories), as a
>> general design principle. We already do some geometry version checks to
>> avoid dereferencing unnecessary pointers on the fast path, which I would
>> eventually like to remove.
> 
> Ok, now I understand your point of view and agree with that, I will
> prepare second version of this patch without this change.

Sounds good.

> Thanks for
> the clarification.
> 

Sure :)

>>>> A more generic way of doing this would be to have a default value for
>>>> 2.0 too, in case mw_cunits is reported as 0.
>>> 
>>> Since 0 is correct value and users can make different decisions based
>>> on it, I think we shouldn't overwrite it by default value. Is it make
>>> sense?
>> 
>> Here I meant at a pblk level - I should have specified it. At the geometry
>> level, we should not change it.
>> 
>> The case I am thinking is if mw_cuints repoints 0, but ws_min > 0. In this case,
>> we still need a host side buffer to serve < ws_min I/Os, even though the
>> device does not require the buffer to guarantee reads.
> 
> Oh, ok now we are on the same page. In this patch I was trying to
> address such case. Do you have other idea how to do it or here are you
> thinking only on value of default variable?

If doing this, I guess that something in the line of what you did with
increasing the size of the write buffer via a module parameter. For
example, checking if the size of the write buffer based on mw_cuints is
enough to cover ws_min, which normally would only be an issue when
mw_cuints == 0 or when the number of PUs used for the pblk instance is
very small and mw_cuints < nr_luns * ws_min.

> 
>>>> Javier
>>> 
>>> Thanks,
>>> Marcin
>> 
>> Javier
> Thanks,
> Marcin




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux