> On 9 Oct 2018, at 21.56, Matias Bjørling <mb@xxxxxxxxxxx> wrote: > > On 10/09/2018 02:10 PM, Hans Holmberg wrote: >> On Tue, Oct 9, 2018 at 12:03 PM Igor Konopko <igor.j.konopko@xxxxxxxxx> wrote: >>> On 09.10.2018 11:16, Hans Holmberg wrote: >>>> On Fri, Oct 5, 2018 at 3:38 PM Igor Konopko <igor.j.konopko@xxxxxxxxx> wrote: >>>>> Currently whole lightnvm and pblk uses single DMA pool, >>>>> for which entry size is always equal to PAGE_SIZE. >>>>> PPA list always needs 8b*64, so there is only 56b*64 >>>>> space for OOB meta. Since NVMe OOB meta can be bigger, >>>>> such as 128b, this solution is not robustness. >>>>> >>>>> This patch add the possiblity to support OOB meta above >>>>> 56b by creating separate DMA pool for PBLK with entry >>>>> size which is big enough to store both PPA list and such >>>>> a OOB metadata. >>>>> >>>>> Signed-off-by: Igor Konopko <igor.j.konopko@xxxxxxxxx> >>>>> --- >>>>> drivers/lightnvm/core.c | 33 +++++++++++++++++++++++--------- >>>>> drivers/lightnvm/pblk-core.c | 19 +++++++++--------- >>>>> drivers/lightnvm/pblk-init.c | 11 +++++++++++ >>>>> drivers/lightnvm/pblk-read.c | 3 ++- >>>>> drivers/lightnvm/pblk-recovery.c | 9 +++++---- >>>>> drivers/lightnvm/pblk.h | 11 ++++++++++- >>>>> drivers/nvme/host/lightnvm.c | 6 ++++-- >>>>> include/linux/lightnvm.h | 8 +++++--- >>>>> 8 files changed, 71 insertions(+), 29 deletions(-) >>>>> >>>>> diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c >>>>> index efb976a863d2..48db7a096257 100644 >>>>> --- a/drivers/lightnvm/core.c >>>>> +++ b/drivers/lightnvm/core.c >>>>> @@ -641,20 +641,33 @@ void nvm_unregister_tgt_type(struct nvm_tgt_type *tt) >>>>> } >>>>> EXPORT_SYMBOL(nvm_unregister_tgt_type); >>>>> >>>>> -void *nvm_dev_dma_alloc(struct nvm_dev *dev, gfp_t mem_flags, >>>>> - dma_addr_t *dma_handler) >>>>> +void *nvm_dev_dma_alloc(struct nvm_dev *dev, void *pool, >>>>> + gfp_t mem_flags, dma_addr_t *dma_handler) >>>>> { >>>>> - return dev->ops->dev_dma_alloc(dev, dev->dma_pool, mem_flags, >>>>> - dma_handler); >>>>> + return dev->ops->dev_dma_alloc(dev, pool ?: dev->dma_pool, >>>>> + mem_flags, dma_handler); >>>>> } >>>>> EXPORT_SYMBOL(nvm_dev_dma_alloc); >>>>> >>>>> -void nvm_dev_dma_free(struct nvm_dev *dev, void *addr, dma_addr_t dma_handler) >>>>> +void nvm_dev_dma_free(struct nvm_dev *dev, void *pool, >>>>> + void *addr, dma_addr_t dma_handler) >>>>> { >>>>> - dev->ops->dev_dma_free(dev->dma_pool, addr, dma_handler); >>>>> + dev->ops->dev_dma_free(pool ?: dev->dma_pool, addr, dma_handler); >>>>> } >>>>> EXPORT_SYMBOL(nvm_dev_dma_free); >>>>> >>>>> +void *nvm_dev_dma_create(struct nvm_dev *dev, int size, char *name) >>>>> +{ >>>>> + return dev->ops->create_dma_pool(dev, name, size); >>>>> +} >>>>> +EXPORT_SYMBOL(nvm_dev_dma_create); >>>>> + >>>>> +void nvm_dev_dma_destroy(struct nvm_dev *dev, void *pool) >>>>> +{ >>>>> + dev->ops->destroy_dma_pool(pool); >>>>> +} >>>>> +EXPORT_SYMBOL(nvm_dev_dma_destroy); >>>>> + >>>>> static struct nvm_dev *nvm_find_nvm_dev(const char *name) >>>>> { >>>>> struct nvm_dev *dev; >>>>> @@ -682,7 +695,8 @@ static int nvm_set_rqd_ppalist(struct nvm_tgt_dev *tgt_dev, struct nvm_rq *rqd, >>>>> } >>>>> >>>>> rqd->nr_ppas = nr_ppas; >>>>> - rqd->ppa_list = nvm_dev_dma_alloc(dev, GFP_KERNEL, &rqd->dma_ppa_list); >>>>> + rqd->ppa_list = nvm_dev_dma_alloc(dev, NULL, GFP_KERNEL, >>>>> + &rqd->dma_ppa_list); >>>>> if (!rqd->ppa_list) { >>>>> pr_err("nvm: failed to allocate dma memory\n"); >>>>> return -ENOMEM; >>>>> @@ -708,7 +722,8 @@ static void nvm_free_rqd_ppalist(struct nvm_tgt_dev *tgt_dev, >>>>> if (!rqd->ppa_list) >>>>> return; >>>>> >>>>> - nvm_dev_dma_free(tgt_dev->parent, rqd->ppa_list, rqd->dma_ppa_list); >>>>> + nvm_dev_dma_free(tgt_dev->parent, NULL, rqd->ppa_list, >>>>> + rqd->dma_ppa_list); >>>>> } >>>>> >>>>> static int nvm_set_flags(struct nvm_geo *geo, struct nvm_rq *rqd) >>>>> @@ -1145,7 +1160,7 @@ int nvm_register(struct nvm_dev *dev) >>>>> if (!dev->q || !dev->ops) >>>>> return -EINVAL; >>>>> >>>>> - dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist"); >>>>> + dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist", PAGE_SIZE); >>>>> if (!dev->dma_pool) { >>>>> pr_err("nvm: could not create dma pool\n"); >>>>> return -ENOMEM; >>>> >>>> Why hack the nvm_dev_ interfaces when you are not using the dev pool anyway? >>>> Wouldn't it be more straightforward to use dma_pool_* instead? >>> >>> In order to call dma_pool_create() I need NVMe device structure, which >>> in my understanding is not public, so this is why I decided to reuse >>> plumbing which was available in nvm_dev_* interfaces. >> Hmm, yes, I see now. >>> If there is some easy way to call dma_pool_create() from pblk module and >>> I'm missing that - let me know. I can rewrite this part, if there is >>> some better way to do so. >> Create and destroy needs to go through dev->ops, but once you have >> allocated the pool, there is no need for going through the dev->ops >> for allocate/free as far as I can see. >> Matias: can't we just replace .dev_dma_alloc / .dev_dma_free by calls >> to dma_pool_alloc/free ? > > Better to make the initial dma alloc better. I don't want targets to play around with their own DMA pools. > > Something like the below (only compile tested :)): Good idea. One comment below, though. > > diff --git i/drivers/lightnvm/core.c w/drivers/lightnvm/core.c > index efb976a863d2..f87696aa6b57 100644 > --- i/drivers/lightnvm/core.c > +++ w/drivers/lightnvm/core.c > @@ -1141,11 +1141,17 @@ EXPORT_SYMBOL(nvm_alloc_dev); > int nvm_register(struct nvm_dev *dev) > { > int ret; > + int meta_sz; > > if (!dev->q || !dev->ops) > return -EINVAL; > > - dev->dma_pool = dev->ops->create_dma_pool(dev, "ppalist"); > + /* Allow room for both metadata and ppa list in the dma pool */ > + meta_sz = (dev->geo.sos * NVM_MAX_VLBA) + > + (sizeof(uint64_t) * NVM_MAX_VLBA); > + meta_sz = PAGE_SIZE * (meta_sz >> PAGE_SHIFT); Can we just send the required min meta_sz and then let the dma allocation do the alignment? The diver knows if it can fit the allocation without wasting unnecessary space. Also, by sharing the ppa_list allocation and the metadata on the same pool, we risk having unnecessary larger DMA buffers for the ppa_list. Better have 2 pools, each with each own size. In fact, the only reason we made the original allocation PAGE_SIZE was to fit the ppa_list and metadata on the same allocation; now that we are relaxing this, we can have tighter allocations for the ppa_list as there is a hard limit on 64 lbas (512B). Javier
Attachment:
signature.asc
Description: Message signed with OpenPGP