Re: [PATCH 3/5] lightnvm: Flexible DMA pool entry size

Javier Gonzalez <javier@xxxxxxxxxxxx> · Tue, 9 Oct 2018 13:10:38 +0000

> On 9 Oct 2018, at 21.56, Matias Bjørling <mb@xxxxxxxxxxx> wrote:
> 
> On 10/09/2018 02:10 PM, Hans Holmberg wrote:
>> On Tue, Oct 9, 2018 at 12:03 PM Igor Konopko <igor.j.konopko@xxxxxxxxx> wrote:
>>> On 09.10.2018 11:16, Hans Holmberg wrote:
>>>> On Fri, Oct 5, 2018 at 3:38 PM Igor Konopko <igor.j.konopko@xxxxxxxxx> wrote:
>>>>> Currently whole lightnvm and pblk uses single DMA pool,
>>>>> for which entry size is always equal to PAGE_SIZE.
>>>>> PPA list always needs 8b*64, so there is only 56b*64
>>>>> space for OOB meta. Since NVMe OOB meta can be bigger,
>>>>> such as 128b, this solution is not robustness.
>>>>> 
>>>>> This patch add the possiblity to support OOB meta above
>>>>> 56b by creating separate DMA pool for PBLK with entry
>>>>> size which is big enough to store both PPA list and such
>>>>> a OOB metadata.
>>>>> 
>>>>> Signed-off-by: Igor Konopko <igor.j.konopko@xxxxxxxxx>
>>>>> ---
>>>>>   drivers/lightnvm/core.c          | 33 +++++++++++++++++++++++---------
>>>>>   drivers/lightnvm/pblk-core.c     | 19 +++++++++---------
>>>>>   drivers/lightnvm/pblk-init.c     | 11 +++++++++++
>>>>>   drivers/lightnvm/pblk-read.c     |  3 ++-
>>>>>   drivers/lightnvm/pblk-recovery.c |  9 +++++----
>>>>>   drivers/lightnvm/pblk.h          | 11 ++++++++++-
>>>>>   drivers/nvme/host/lightnvm.c     |  6 ++++--
>>>>>   include/linux/lightnvm.h         |  8 +++++---
>>>>>   8 files changed, 71 insertions(+), 29 deletions(-)
>>>>> 
>>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
>>>>> index efb976a863d2..48db7a096257 100644
>>>>> --- a/drivers/lightnvm/core.c
>>>>> +++ b/drivers/lightnvm/core.c
>>>>> @@ -641,20 +641,33 @@ void nvm_unregister_tgt_type(struct nvm_tgt_type *tt)
>>>>>   }
>>>>>   EXPORT_SYMBOL(nvm_unregister_tgt_type);
>>>>> 
>>>>> -void *nvm_dev_dma_alloc(struct nvm_dev *dev, gfp_t mem_flags,
>>>>> -                                                       dma_addr_t *dma_handler)
>>>>> +void *nvm_dev_dma_alloc(struct nvm_dev *dev, void *pool,
>>>>> +                               gfp_t mem_flags, dma_addr_t *dma_handler)
>>>>>   {
>>>>> -       return dev->ops->dev_dma_alloc(dev, dev->dma_pool, mem_flags,
>>>>> -                                                               dma_handler);
>>>>> +       return dev->ops->dev_dma_alloc(dev, pool ?: dev->dma_pool,
>>>>> +                                               mem_flags, dma_handler);
>>>>>   }
>>>>>   EXPORT_SYMBOL(nvm_dev_dma_alloc);
>>>>> 
>>>>> -void nvm_dev_dma_free(struct nvm_dev *dev, void *addr, dma_addr_t dma_handler)
>>>>> +void nvm_dev_dma_free(struct nvm_dev *dev, void *pool,
>>>>> +                               void *addr, dma_addr_t dma_handler)
>>>>>   {
>>>>> -       dev->ops->dev_dma_free(dev->dma_pool, addr, dma_handler);
>>>>> +       dev->ops->dev_dma_free(pool ?: dev->dma_pool, addr, dma_handler);
>>>>>   }
>>>>>   EXPORT_SYMBOL(nvm_dev_dma_free);
>>>>> 
>>>>> +void *nvm_dev_dma_create(struct nvm_dev *dev, int size, char *name)
>>>>> +{
>>>>> +       return dev->ops->create_dma_pool(dev, name, size);
>>>>> +}
>>>>> +EXPORT_SYMBOL(nvm_dev_dma_create);
>>>>> +
>>>>> +void nvm_dev_dma_destroy(struct nvm_dev *dev, void *pool)
>>>>> +{
>>>>> +       dev->ops->destroy_dma_pool(pool);
>>>>> +}
>>>>> +EXPORT_SYMBOL(nvm_dev_dma_destroy);
>>>>> +
>>>>>   static struct nvm_dev *nvm_find_nvm_dev(const char *name)
>>>>>   {
>>>>>          struct nvm_dev *dev;
>>>>> @@ -682,7 +695,8 @@ static int nvm_set_rqd_ppalist(struct nvm_tgt_dev *tgt_dev, struct nvm_rq *rqd,
>>>>>          }
>>>>> 
>>>>>          rqd->nr_ppas = nr_ppas;
>>>>> -       rqd->ppa_list = nvm_dev_dma_alloc(dev, GFP_KERNEL, &rqd->dma_ppa_list);
>>>>> +       rqd->ppa_list = nvm_dev_dma_alloc(dev, NULL, GFP_KERNEL,
>>>>> +                                               &rqd->dma_ppa_list);
>>>>>          if (!rqd->ppa_list) {
>>>>>                  pr_err("nvm: failed to allocate dma memory\n");
>>>>>                  return -ENOMEM;
>>>>> @@ -708,7 +722,8 @@ static void nvm_free_rqd_ppalist(struct nvm_tgt_dev *tgt_dev,
>>>>>          if (!rqd->ppa_list)
>>>>>                  return;
>>>>> 
>>>>> -       nvm_dev_dma_free(tgt_dev->parent, rqd->ppa_list, rqd->dma_ppa_list);
>>>>> +       nvm_dev_dma_free(tgt_dev->parent, NULL, rqd->ppa_list,
>>>>> +                               rqd->dma_ppa_list);
>>>>>   }
>>>>> 
>>>>>   static int nvm_set_flags(struct nvm_geo *geo, struct nvm_rq *rqd)
>>>>> @@ -1145,7 +1160,7 @@ int nvm_register(struct nvm_dev *dev)
>>>>>          if (!dev->q || !dev->ops)
>>>>>                  return -EINVAL;
>>>>> 
>>>>> -       dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist");
>>>>> +       dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist", PAGE_SIZE);
>>>>>          if (!dev->dma_pool) {
>>>>>                  pr_err("nvm: could not create dma pool\n");
>>>>>                  return -ENOMEM;
>>>> 
>>>> Why hack the nvm_dev_ interfaces when you are not using the dev pool anyway?
>>>> Wouldn't it be more straightforward to use dma_pool_* instead?
>>> 
>>> In order to call dma_pool_create() I need NVMe device structure, which
>>> in my understanding is not public, so this is why I decided to reuse
>>> plumbing which was available in nvm_dev_* interfaces.
>> Hmm, yes, I see now.
>>> If there is some easy way to call dma_pool_create() from pblk module and
>>> I'm missing that - let me know. I can rewrite this part, if there is
>>> some better way to do so.
>> Create and destroy needs to go through dev->ops, but once you have
>> allocated the pool, there is no need for going through the dev->ops
>> for allocate/free as far as I can see.
>> Matias: can't we just replace .dev_dma_alloc / .dev_dma_free  by calls
>> to dma_pool_alloc/free ?
> 
> Better to make the initial dma alloc better. I don't want targets to play around with their own DMA pools.
> 
> Something like the below (only compile tested :)):

Good idea. One comment below, though.

> 
> diff --git i/drivers/lightnvm/core.c w/drivers/lightnvm/core.c
> index efb976a863d2..f87696aa6b57 100644
> --- i/drivers/lightnvm/core.c
> +++ w/drivers/lightnvm/core.c
> @@ -1141,11 +1141,17 @@ EXPORT_SYMBOL(nvm_alloc_dev);
> int nvm_register(struct nvm_dev *dev)
> {
>        int ret;
> +       int meta_sz;
> 
>        if (!dev->q || !dev->ops)
>                return -EINVAL;
> 
> -       dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist");
> +       /* Allow room for both metadata and ppa list in the dma pool */
> +       meta_sz = (dev->geo.sos * NVM_MAX_VLBA) +
> +                                       (sizeof(uint64_t) * NVM_MAX_VLBA);
> +       meta_sz = PAGE_SIZE * (meta_sz >> PAGE_SHIFT);

Can we just send the required min meta_sz and then let the dma
allocation do the alignment? The diver knows if it can fit the
allocation without wasting unnecessary space.

Also, by sharing the ppa_list allocation and the metadata on the same
pool, we risk having unnecessary larger DMA buffers for the ppa_list.
Better have 2 pools, each with each own size. In fact, the only reason
we made the original allocation PAGE_SIZE was to fit the ppa_list and
metadata on the same allocation; now that we are relaxing this, we can
have tighter allocations for the ppa_list as there is a hard limit on 64
lbas (512B).

Javier
Attachment:
signature.asc

Description: Message signed with OpenPGP