Re: [PATCH 1/6] lightnvm: pblk: check for failed mempool alloc.

Javier González <jg@xxxxxxxxxxx> · Wed, 6 Sep 2017 20:28:55 +0200

> On 6 Sep 2017, at 17.20, Jens Axboe <axboe@xxxxxxxxx> wrote:
> 
> On 09/06/2017 09:13 AM, Jens Axboe wrote:
>> On 09/06/2017 09:12 AM, Javier González wrote:
>>>> On 6 Sep 2017, at 17.09, Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>> 
>>>> On 09/06/2017 09:08 AM, Johannes Thumshirn wrote:
>>>>> On Wed, Sep 06, 2017 at 05:01:01PM +0200, Javier González wrote:
>>>>>> Check for failed mempool allocations and act accordingly.
>>>>> 
>>>>> Are you sure it is needed? Quoting from mempool_alloc()s Documentation:
>>>>> "[...] Note that due to preallocation, this function *never* fails when called
>>>>> from process contexts. (it might fail if called from an IRQ context.) [...]"
>>>> 
>>>> It's not needed, mempool() will never fail if __GFP_WAIT is set in the
>>>> mask. The use case here is GFP_KERNEL, which does include __GFP_WAIT.
>>> 
>>> Thanks for the clarification. Do you just drop the patch, or do you want
>>> me to re-send the series?
>> 
>> No need to resend. I'll pick up the others in a day or two, once people
>> have had some time to go over them.
> 
> I took a quick look at your mempool usage, and I'm not sure it's
> correct.  For a mempool to work, you have to ensure that you provide a
> forward progress guarantee. With that guarantee, you know that if you do
> end up sleeping on allocation, you already have items inflight that will
> be freed when that operation completes. In other words, all allocations
> must have a defined and finite life time, as any allocation can
> potentially sleep/block for that life time. You can't allocate something
> and hold on to it forever, then you are violating the terms of agreement
> that makes a mempool work.

I understood the part of guaranteeing the number of inflight items to
keep the mempool active without waiting, but I must admit that I assumed
that the mempool would resize when getting pressure and that the penalty
would be increased latency, not the mempool giving up and causing a
deadlock.

> 
> The first one that caught my eye is pblk->page_pool. You have this loop:
> 
> for (i = 0; i < nr_pages; i++) {
>        page = mempool_alloc(pblk->page_pool, flags);
>        if (!page)
>                goto err;
> 
>        ret = bio_add_pc_page(q, bio, page, PBLK_EXPOSED_PAGE_SIZE, 0);
>        if (ret != PBLK_EXPOSED_PAGE_SIZE) {
>                pr_err("pblk: could not add page to bio\n");
>                mempool_free(page, pblk->page_pool);
>                goto err;
>        }
> }
> 
> which looks suspect. This mempool is created with a reserve pool of
> PAGE_POOL_SIZE (16) members. Do we know if the bio has 16 pages or less?
> If not, then this is broken and can deadlock forever.

I can see that in this case, the 16 elements do not hold. In the read
path, we can guarantee that a read will be <= 64 sectors (4KB pages), so
this is definitely wrong. I'll fix it tomorrow.

Since we are at it, I have for some time wondered what's the right way
to balance the mempool sizes so that we are a good citizen and perform
at the same time. I don't see a lot of code using mempool_resize to tune
the min_nr based on load. For example, in our write path, the numbers
are easy to calculate, but on the read path I completely
over-dimensioned the mempool to ensure not having to wait for the
completion path. Any good rule of thumb here?

> You have a lot of mempool usage in the code, would probably not hurt to
> audit all of them.

Yes. I will take a look and add comments to the sizes.

Thanks Jens,
Javier
Attachment:
signature.asc

Description: Message signed with OpenPGP