> On 11 Sep 2018, at 12.22, Igor Konopko <igor.j.konopko@xxxxxxxxx> wrote: > > > > On 11.09.2018 11:14, Javier Gonzalez wrote: >>> On 10 Sep 2018, at 12.21, Matias Bjørling <mb@xxxxxxxxxxx> wrote: >>> >>> On 08/29/2018 12:09 PM, Javier González wrote: >>>> pblk uses 8 bytes in the metadata region area exposed by the device >>>> through the out of band area to store the lba mapped to the given >>>> physical sector. This is used for recovery purposes. Given that the >>>> first generation OCSSD devices exposed 16 bytes, pblk used a hard-coded >>>> structure for this purpose. >>>> This patch relaxes the 16 bytes assumption and uses the metadata size >>>> reported by the device to layout metadata appropriately for the vector >>>> commands. This adds support for arbitrary metadata sizes, as long as >>>> these are larger than 8 bytes. Note that this patch does not address the >>>> case in which the device does not expose an out of band area and that >>>> pblk creation will fail in this case. >>>> Signed-off-by: Javier González <javier@xxxxxxxxxxxx> >>>> --- >>>> drivers/lightnvm/pblk-core.c | 56 ++++++++++++++++++++++++++++++---------- >>>> drivers/lightnvm/pblk-init.c | 14 ++++++++++ >>>> drivers/lightnvm/pblk-map.c | 19 +++++++++----- >>>> drivers/lightnvm/pblk-read.c | 55 +++++++++++++++++++++++++-------------- >>>> drivers/lightnvm/pblk-recovery.c | 34 +++++++++++++++++------- >>>> drivers/lightnvm/pblk.h | 18 ++++++++++--- >>>> 6 files changed, 143 insertions(+), 53 deletions(-) >>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c >>>> index a311cc29afd8..d52e0047ae9d 100644 >>>> --- a/drivers/lightnvm/pblk-core.c >>>> +++ b/drivers/lightnvm/pblk-core.c >>>> @@ -250,8 +250,20 @@ int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags, >>>> if (!is_vector) >>>> return 0; >>>> - rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size; >>>> - rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size; >>>> + if (pblk->dma_shared) { >>>> + rqd->ppa_list = rqd->meta_list + pblk->dma_meta_size; >>>> + rqd->dma_ppa_list = rqd->dma_meta_list + pblk->dma_meta_size; >>>> + >>>> + return 0; >>>> + } >>>> + >>>> + rqd->ppa_list = nvm_dev_dma_alloc(dev->parent, mem_flags, >>>> + &rqd->dma_ppa_list); >>>> + if (!rqd->ppa_list) { >>>> + nvm_dev_dma_free(dev->parent, rqd->meta_list, >>>> + rqd->dma_meta_list); >>>> + return -ENOMEM; >>>> + } >>>> return 0; >>>> } >>>> @@ -262,7 +274,11 @@ void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd) >>>> if (rqd->meta_list) >>>> nvm_dev_dma_free(dev->parent, rqd->meta_list, >>>> - rqd->dma_meta_list); >>>> + rqd->dma_meta_list); >>>> + >>>> + if (!pblk->dma_shared && rqd->ppa_list) >>>> + nvm_dev_dma_free(dev->parent, rqd->ppa_list, >>>> + rqd->dma_ppa_list); >>>> } >>>> /* Caller must guarantee that the request is a valid type */ >>>> @@ -796,10 +812,12 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line, >>>> rqd.is_seq = 1; >>>> for (i = 0; i < lm->smeta_sec; i++, paddr++) { >>>> - struct pblk_sec_meta *meta_list = rqd.meta_list; >>>> + struct pblk_sec_meta *meta; >>>> rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id); >>>> - meta_list[i].lba = lba_list[paddr] = addr_empty; >>>> + >>>> + meta = sec_meta_index(pblk, rqd.meta_list, i); >>>> + meta->lba = lba_list[paddr] = addr_empty; >>>> } >>>> ret = pblk_submit_io_sync_sem(pblk, &rqd); >>>> @@ -845,8 +863,17 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, >>>> if (!meta_list) >>>> return -ENOMEM; >>>> - ppa_list = meta_list + pblk_dma_meta_size; >>>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size; >>>> + if (pblk->dma_shared) { >>>> + ppa_list = meta_list + pblk->dma_meta_size; >>>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size; >>>> + } else { >>>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, >>>> + &dma_ppa_list); >>>> + if (!ppa_list) { >>>> + ret = -ENOMEM; >>>> + goto free_meta_list; >>>> + } >>>> + } >>>> next_rq: >>>> memset(&rqd, 0, sizeof(struct nvm_rq)); >>>> @@ -858,7 +885,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, >>>> l_mg->emeta_alloc_type, GFP_KERNEL); >>>> if (IS_ERR(bio)) { >>>> ret = PTR_ERR(bio); >>>> - goto free_rqd_dma; >>>> + goto free_ppa_list; >>>> } >>>> bio->bi_iter.bi_sector = 0; /* internal bio */ >>>> @@ -884,7 +911,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, >>>> if (pblk_boundary_paddr_checks(pblk, paddr)) { >>>> bio_put(bio); >>>> ret = -EINTR; >>>> - goto free_rqd_dma; >>>> + goto free_ppa_list; >>>> } >>>> ppa = addr_to_gen_ppa(pblk, paddr, line_id); >>>> @@ -894,7 +921,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, >>>> if (pblk_boundary_paddr_checks(pblk, paddr + min)) { >>>> bio_put(bio); >>>> ret = -EINTR; >>>> - goto free_rqd_dma; >>>> + goto free_ppa_list; >>>> } >>>> for (j = 0; j < min; j++, i++, paddr++) >>>> @@ -905,7 +932,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, >>>> if (ret) { >>>> pblk_err(pblk, "emeta I/O submission failed: %d\n", ret); >>>> bio_put(bio); >>>> - goto free_rqd_dma; >>>> + goto free_ppa_list; >>>> } >>>> atomic_dec(&pblk->inflight_io); >>>> @@ -918,8 +945,11 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, >>>> if (left_ppas) >>>> goto next_rq; >>>> -free_rqd_dma: >>>> - nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list); >>>> +free_ppa_list: >>>> + if (!pblk->dma_shared) >>>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list); >>>> +free_meta_list: >>>> + nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list); >>>> return ret; >>>> } >>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c >>>> index a99854439224..57972156c318 100644 >>>> --- a/drivers/lightnvm/pblk-init.c >>>> +++ b/drivers/lightnvm/pblk-init.c >>>> @@ -354,6 +354,20 @@ static int pblk_core_init(struct pblk *pblk) >>>> struct nvm_geo *geo = &dev->geo; >>>> int ret, max_write_ppas; >>>> + if (sizeof(struct pblk_sec_meta) > geo->sos) { >>>> + pblk_err(pblk, "OOB area too small. Min %lu bytes (%d)\n", >>>> + (unsigned long)sizeof(struct pblk_sec_meta), geo->sos); >>>> + return -EINTR; >>>> + } >>>> + >>>> + pblk->dma_ppa_size = (sizeof(u64) * NVM_MAX_VLBA); >>>> + pblk->dma_meta_size = geo->sos * NVM_MAX_VLBA; >>>> + >>>> + if (pblk->dma_ppa_size + pblk->dma_meta_size > PAGE_SIZE) >>>> + pblk->dma_shared = false; >>>> + else >>>> + pblk->dma_shared = true; >>>> + >>>> atomic64_set(&pblk->user_wa, 0); >>>> atomic64_set(&pblk->pad_wa, 0); >>>> atomic64_set(&pblk->gc_wa, 0); >>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c >>>> index dc0efb852475..55fca16d18e4 100644 >>>> --- a/drivers/lightnvm/pblk-map.c >>>> +++ b/drivers/lightnvm/pblk-map.c >>>> @@ -25,6 +25,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, >>>> unsigned int valid_secs) >>>> { >>>> struct pblk_line *line = pblk_line_get_data(pblk); >>>> + struct pblk_sec_meta *meta; >>>> struct pblk_emeta *emeta; >>>> struct pblk_w_ctx *w_ctx; >>>> __le64 *lba_list; >>>> @@ -56,6 +57,8 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, >>>> /* ppa to be sent to the device */ >>>> ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id); >>>> + meta = sec_meta_index(pblk, meta_list, i); >>>> + >>>> /* Write context for target bio completion on write buffer. Note >>>> * that the write buffer is protected by the sync backpointer, >>>> * and a single writer thread have access to each specific entry >>>> @@ -67,14 +70,14 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, >>>> kref_get(&line->ref); >>>> w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i); >>>> w_ctx->ppa = ppa_list[i]; >>>> - meta_list[i].lba = cpu_to_le64(w_ctx->lba); >>>> + meta->lba = cpu_to_le64(w_ctx->lba); >>>> lba_list[paddr] = cpu_to_le64(w_ctx->lba); >>>> if (lba_list[paddr] != addr_empty) >>>> line->nr_valid_lbas++; >>>> else >>>> atomic64_inc(&pblk->pad_wa); >>>> } else { >>>> - lba_list[paddr] = meta_list[i].lba = addr_empty; >>>> + lba_list[paddr] = meta->lba = addr_empty; >>>> __pblk_map_invalidate(pblk, line, paddr); >>>> } >>>> } >>>> @@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, >>>> unsigned long *lun_bitmap, unsigned int valid_secs, >>>> unsigned int off) >>>> { >>>> - struct pblk_sec_meta *meta_list = rqd->meta_list; >>>> + struct pblk_sec_meta *meta_list; >>>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd); >>>> unsigned int map_secs; >>>> int min = pblk->min_write_pgs; >>>> @@ -95,8 +98,10 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, >>>> for (i = off; i < rqd->nr_ppas; i += min) { >>>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min; >>>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i); >>>> + >>>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i], >>>> - lun_bitmap, &meta_list[i], map_secs)) { >>>> + lun_bitmap, meta_list, map_secs)) { >>>> bio_put(rqd->bio); >>>> pblk_free_rqd(pblk, rqd, PBLK_WRITE); >>>> pblk_pipeline_stop(pblk); >>>> @@ -112,8 +117,8 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd, >>>> struct nvm_tgt_dev *dev = pblk->dev; >>>> struct nvm_geo *geo = &dev->geo; >>>> struct pblk_line_meta *lm = &pblk->lm; >>>> - struct pblk_sec_meta *meta_list = rqd->meta_list; >>>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd); >>>> + struct pblk_sec_meta *meta_list; >>>> struct pblk_line *e_line, *d_line; >>>> unsigned int map_secs; >>>> int min = pblk->min_write_pgs; >>>> @@ -121,8 +126,10 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd, >>>> for (i = 0; i < rqd->nr_ppas; i += min) { >>>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min; >>>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i); >>>> + >>>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i], >>>> - lun_bitmap, &meta_list[i], map_secs)) { >>>> + lun_bitmap, meta_list, map_secs)) { >>>> bio_put(rqd->bio); >>>> pblk_free_rqd(pblk, rqd, PBLK_WRITE); >>>> pblk_pipeline_stop(pblk); >>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c >>>> index 57d3155ef9a5..12b690e2abd9 100644 >>>> --- a/drivers/lightnvm/pblk-read.c >>>> +++ b/drivers/lightnvm/pblk-read.c >>>> @@ -42,7 +42,6 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd, >>>> struct bio *bio, sector_t blba, >>>> unsigned long *read_bitmap) >>>> { >>>> - struct pblk_sec_meta *meta_list = rqd->meta_list; >>>> struct ppa_addr ppas[NVM_MAX_VLBA]; >>>> int nr_secs = rqd->nr_ppas; >>>> bool advanced_bio = false; >>>> @@ -51,13 +50,16 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd, >>>> pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs); >>>> for (i = 0; i < nr_secs; i++) { >>>> + struct pblk_sec_meta *meta; >>>> struct ppa_addr p = ppas[i]; >>>> sector_t lba = blba + i; >>>> + meta = sec_meta_index(pblk, rqd->meta_list, i); >>>> retry: >>>> if (pblk_ppa_empty(p)) { >>>> WARN_ON(test_and_set_bit(i, read_bitmap)); >>>> - meta_list[i].lba = cpu_to_le64(ADDR_EMPTY); >>>> + >>>> + meta->lba = cpu_to_le64(ADDR_EMPTY); >>>> if (unlikely(!advanced_bio)) { >>>> bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE); >>>> @@ -77,7 +79,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd, >>>> goto retry; >>>> } >>>> WARN_ON(test_and_set_bit(i, read_bitmap)); >>>> - meta_list[i].lba = cpu_to_le64(lba); >>>> + meta->lba = cpu_to_le64(lba); >>>> advanced_bio = true; >>>> #ifdef CONFIG_NVM_PBLK_DEBUG >>>> atomic_long_inc(&pblk->cache_reads); >>>> @@ -104,12 +106,15 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd, >>>> static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd, >>>> sector_t blba) >>>> { >>>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list; >>>> int nr_lbas = rqd->nr_ppas; >>>> int i; >>>> for (i = 0; i < nr_lbas; i++) { >>>> - u64 lba = le64_to_cpu(meta_lba_list[i].lba); >>>> + struct pblk_sec_meta *meta; >>>> + u64 lba; >>>> + >>>> + meta = sec_meta_index(pblk, rqd->meta_list, i); >>>> + lba = le64_to_cpu(meta->lba); >>>> if (lba == ADDR_EMPTY) >>>> continue; >>>> @@ -133,17 +138,18 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd, >>>> static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd, >>>> u64 *lba_list, int nr_lbas) >>>> { >>>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list; >>>> int i, j; >>>> for (i = 0, j = 0; i < nr_lbas; i++) { >>>> + struct pblk_sec_meta *meta; >>>> u64 lba = lba_list[i]; >>>> u64 meta_lba; >>>> if (lba == ADDR_EMPTY) >>>> continue; >>>> - meta_lba = le64_to_cpu(meta_lba_list[j].lba); >>>> + meta = sec_meta_index(pblk, rqd->meta_list, j); >>>> + meta_lba = le64_to_cpu(meta->lba); >>>> if (lba != meta_lba) { >>>> #ifdef CONFIG_NVM_PBLK_DEBUG >>>> @@ -218,7 +224,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) >>>> struct bio *new_bio = rqd->bio; >>>> struct bio *bio = pr_ctx->orig_bio; >>>> struct bio_vec src_bv, dst_bv; >>>> - struct pblk_sec_meta *meta_list = rqd->meta_list; >>>> + struct pblk_sec_meta *meta; >>>> int bio_init_idx = pr_ctx->bio_init_idx; >>>> unsigned long *read_bitmap = pr_ctx->bitmap; >>>> int nr_secs = pr_ctx->orig_nr_secs; >>>> @@ -237,12 +243,13 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) >>>> } >>>> /* Re-use allocated memory for intermediate lbas */ >>>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size); >>>> - lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size); >>>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size); >>>> + lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk->dma_ppa_size); >>>> for (i = 0; i < nr_secs; i++) { >>>> - lba_list_media[i] = meta_list[i].lba; >>>> - meta_list[i].lba = lba_list_mem[i]; >>>> + meta = sec_meta_index(pblk, rqd->meta_list, i); >>>> + lba_list_media[i] = meta->lba; >>>> + meta->lba = lba_list_mem[i]; >>>> } >>>> /* Fill the holes in the original bio */ >>>> @@ -254,7 +261,8 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) >>>> line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]); >>>> kref_put(&line->ref, pblk_line_put); >>>> - meta_list[hole].lba = lba_list_media[i]; >>>> + meta = sec_meta_index(pblk, rqd->meta_list, hole); >>>> + meta->lba = lba_list_media[i]; >>>> src_bv = new_bio->bi_io_vec[i++]; >>>> dst_bv = bio->bi_io_vec[bio_init_idx + hole]; >>>> @@ -290,8 +298,8 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, >>>> unsigned long *read_bitmap, >>>> int nr_holes) >>>> { >>>> - struct pblk_sec_meta *meta_list = rqd->meta_list; >>>> struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd); >>>> + struct pblk_sec_meta *meta; >>>> struct pblk_pr_ctx *pr_ctx; >>>> struct bio *new_bio, *bio = r_ctx->private; >>>> __le64 *lba_list_mem; >>>> @@ -299,7 +307,7 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, >>>> int i; >>>> /* Re-use allocated memory for intermediate lbas */ >>>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size); >>>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size); >>>> new_bio = bio_alloc(GFP_KERNEL, nr_holes); >>>> @@ -315,8 +323,10 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, >>>> if (!pr_ctx) >>>> goto fail_free_pages; >>>> - for (i = 0; i < nr_secs; i++) >>>> - lba_list_mem[i] = meta_list[i].lba; >>>> + for (i = 0; i < nr_secs; i++) { >>>> + meta = sec_meta_index(pblk, rqd->meta_list, i); >>>> + lba_list_mem[i] = meta->lba; >>>> + } >>>> new_bio->bi_iter.bi_sector = 0; /* internal bio */ >>>> bio_set_op_attrs(new_bio, REQ_OP_READ, 0); >>>> @@ -382,7 +392,7 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd, >>>> static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio, >>>> sector_t lba, unsigned long *read_bitmap) >>>> { >>>> - struct pblk_sec_meta *meta_list = rqd->meta_list; >>>> + struct pblk_sec_meta *meta; >>>> struct ppa_addr ppa; >>>> pblk_lookup_l2p_seq(pblk, &ppa, lba, 1); >>>> @@ -394,7 +404,10 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio, >>>> retry: >>>> if (pblk_ppa_empty(ppa)) { >>>> WARN_ON(test_and_set_bit(0, read_bitmap)); >>>> - meta_list[0].lba = cpu_to_le64(ADDR_EMPTY); >>>> + >>>> + meta = sec_meta_index(pblk, rqd->meta_list, 0); >>>> + meta->lba = cpu_to_le64(ADDR_EMPTY); >>>> + >>>> return; >>>> } >>>> @@ -408,7 +421,9 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio, >>>> } >>>> WARN_ON(test_and_set_bit(0, read_bitmap)); >>>> - meta_list[0].lba = cpu_to_le64(lba); >>>> + >>>> + meta = sec_meta_index(pblk, rqd->meta_list, 0); >>>> + meta->lba = cpu_to_le64(lba); >>>> #ifdef CONFIG_NVM_PBLK_DEBUG >>>> atomic_long_inc(&pblk->cache_reads); >>>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c >>>> index 8114013c37b8..1ce92562603d 100644 >>>> --- a/drivers/lightnvm/pblk-recovery.c >>>> +++ b/drivers/lightnvm/pblk-recovery.c >>>> @@ -157,7 +157,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line, >>>> { >>>> struct nvm_tgt_dev *dev = pblk->dev; >>>> struct nvm_geo *geo = &dev->geo; >>>> - struct pblk_sec_meta *meta_list; >>>> + struct pblk_sec_meta *meta; >>>> struct pblk_pad_rq *pad_rq; >>>> struct nvm_rq *rqd; >>>> struct bio *bio; >>>> @@ -218,8 +218,6 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line, >>>> rqd->end_io = pblk_end_io_recov; >>>> rqd->private = pad_rq; >>>> - meta_list = rqd->meta_list; >>>> - >>>> for (i = 0; i < rqd->nr_ppas; ) { >>>> struct ppa_addr ppa; >>>> int pos; >>>> @@ -241,8 +239,10 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line, >>>> dev_ppa = addr_to_gen_ppa(pblk, w_ptr, line->id); >>>> pblk_map_invalidate(pblk, dev_ppa); >>>> - lba_list[w_ptr] = meta_list[i].lba = addr_empty; >>>> rqd->ppa_list[i] = dev_ppa; >>>> + >>>> + meta = sec_meta_index(pblk, rqd->meta_list, i); >>>> + lba_list[w_ptr] = meta->lba = addr_empty; >>>> } >>>> } >>>> @@ -327,7 +327,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line, >>>> struct nvm_tgt_dev *dev = pblk->dev; >>>> struct nvm_geo *geo = &dev->geo; >>>> struct ppa_addr *ppa_list; >>>> - struct pblk_sec_meta *meta_list; >>>> + struct pblk_sec_meta *meta_list, *meta; >>>> struct nvm_rq *rqd; >>>> struct bio *bio; >>>> void *data; >>>> @@ -425,7 +425,10 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line, >>>> } >>>> for (i = 0; i < rqd->nr_ppas; i++) { >>>> - u64 lba = le64_to_cpu(meta_list[i].lba); >>>> + u64 lba; >>>> + >>>> + meta = sec_meta_index(pblk, meta_list, i); >>>> + lba = le64_to_cpu(meta->lba); >>>> lba_list[paddr++] = cpu_to_le64(lba); >>>> @@ -464,13 +467,22 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line) >>>> if (!meta_list) >>>> return -ENOMEM; >>>> - ppa_list = (void *)(meta_list) + pblk_dma_meta_size; >>>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size; >>>> + if (pblk->dma_shared) { >>>> + ppa_list = (void *)(meta_list) + pblk->dma_meta_size; >>>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size; >>>> + } else { >>>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, >>>> + &dma_ppa_list); >>>> + if (!ppa_list) { >>>> + ret = -ENOMEM; >>>> + goto free_meta_list; >>>> + } >>>> + } >>>> data = kcalloc(pblk->max_write_pgs, geo->csecs, GFP_KERNEL); >>>> if (!data) { >>>> ret = -ENOMEM; >>>> - goto free_meta_list; >>>> + goto free_ppa_list; >>>> } >>>> rqd = mempool_alloc(&pblk->r_rq_pool, GFP_KERNEL); >>>> @@ -495,9 +507,11 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line) >>>> out: >>>> mempool_free(rqd, &pblk->r_rq_pool); >>>> kfree(data); >>>> +free_ppa_list: >>>> + if (!pblk->dma_shared) >>>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list); >>>> free_meta_list: >>>> nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list); >>>> - >>>> return ret; >>>> } >>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h >>>> index 22cc9bfbbb10..4526fee206d9 100644 >>>> --- a/drivers/lightnvm/pblk.h >>>> +++ b/drivers/lightnvm/pblk.h >>>> @@ -86,7 +86,6 @@ enum { >>>> }; >>>> struct pblk_sec_meta { >>>> - u64 reserved; >>>> __le64 lba; >>>> }; >>>> @@ -103,9 +102,6 @@ enum { >>>> PBLK_RL_LOW = 4 >>>> }; >>>> -#define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * NVM_MAX_VLBA) >>>> -#define pblk_dma_ppa_size (sizeof(u64) * NVM_MAX_VLBA) >>>> - >>>> /* write buffer completion context */ >>>> struct pblk_c_ctx { >>>> struct list_head list; /* Head for out-of-order completion */ >>>> @@ -637,6 +633,10 @@ struct pblk { >>>> int sec_per_write; >>>> + int dma_meta_size; >>>> + int dma_ppa_size; >>>> + bool dma_shared; >>>> + >>>> unsigned char instance_uuid[16]; >>>> /* Persistent write amplification counters, 4kb sector I/Os */ >>>> @@ -985,6 +985,16 @@ static inline void *emeta_to_vsc(struct pblk *pblk, struct line_emeta *emeta) >>>> return (emeta_to_lbas(pblk, emeta) + pblk->lm.emeta_len[2]); >>>> } >>>> +static inline struct pblk_sec_meta *sec_meta_index(struct pblk *pblk, >>>> + struct pblk_sec_meta *meta, >>>> + int index) >>>> +{ >>>> + struct nvm_tgt_dev *dev = pblk->dev; >>>> + struct nvm_geo *geo = &dev->geo; >>>> + >>>> + return ((void *)meta + index * geo->sos); >>>> +} >>>> + >>>> static inline int pblk_line_vsc(struct pblk_line *line) >>>> { >>>> return le32_to_cpu(*line->vsc); >>> >>> It will be helpful to split this patch into two: >>> >>> - One that does the 8b conversion >>> - One that makes the change to merge metadata and ppa list data buffers >> pblk has always shared the dma buffer for the ppa list and the metadata >> buffer. This patch adds the possibility to not merge if the metadata >> size does not fit. I can separate it in 2 patches, but it seems to me >> like a natural extension when relaxing the 16byte metadata size. >>> - How about making it a simplification to only allow up to 32b >>> metadata buffers, then one doesn't have to think about it crossing >>> multiple pages. >> You mean max. 56 bytes of metadata per 4KB right? That is what is left >> on a 4KB pages after taking out the 512B needed by the ppa list. It's ok >> by me, but I'd like to hear Igor's opinion, as this was Intel's use case >> to start with. > > So I think that if we want to do this properly, we should support > 'all' the metadata sizes, including >64K - which is not the case after > Javier changes. In the past there were NVMe drives available with 128 > bytes of metadata per 4K sector, so it could be similar in future too > for OC drives. I have somewhere patch which should work for any size > of metadata with slightly different approach. I can send it (just need > a moment for rebase) and you will decide which approach you would like > to take. > >>> - You can also make a structure for it, use the first 3.5KB for meta, >>> and the next 512B for PPAs. >> Looks like a huge structure for just managing a couple of pointers, >> don't you think? >> Javier Matias, Can you pick up this patch as is for this window and then Igor can extend it to an arbitrary size in the future? We have a use case for 64B OOB / 4KB, so 56B / 4KB is not enough in this case. Since pblk currently fixes the metadata buffer, the code brakes under this configuration. If you have any comments that can respect this requirement, I can apply them and resubmit. Thanks, Javier
Attachment:
signature.asc
Description: Message signed with OpenPGP