> On 18 Sep 2018, at 10.31, Matias Bjørling <mb@xxxxxxxxxxx> wrote: > > On 09/18/2018 10:09 AM, Javier Gonzalez wrote: >>> On 11 Sep 2018, at 12.22, Igor Konopko <igor.j.konopko@xxxxxxxxx> wrote: >>> >>> >>> >>> On 11.09.2018 11:14, Javier Gonzalez wrote: >>>>> On 10 Sep 2018, at 12.21, Matias Bjørling <mb@xxxxxxxxxxx> wrote: >>>>> >>>>> On 08/29/2018 12:09 PM, Javier González wrote: >>>>>> pblk uses 8 bytes in the metadata region area exposed by the device >>>>>> through the out of band area to store the lba mapped to the given >>>>>> physical sector. This is used for recovery purposes. Given that the >>>>>> first generation OCSSD devices exposed 16 bytes, pblk used a hard-coded >>>>>> structure for this purpose. >>>>>> This patch relaxes the 16 bytes assumption and uses the metadata size >>>>>> reported by the device to layout metadata appropriately for the vector >>>>>> commands. This adds support for arbitrary metadata sizes, as long as >>>>>> these are larger than 8 bytes. Note that this patch does not address the >>>>>> case in which the device does not expose an out of band area and that >>>>>> pblk creation will fail in this case. >>>>>> Signed-off-by: Javier González <javier@xxxxxxxxxxxx> >>>>>> --- >>>>>> drivers/lightnvm/pblk-core.c | 56 ++++++++++++++++++++++++++++++---------- >>>>>> drivers/lightnvm/pblk-init.c | 14 ++++++++++ >>>>>> drivers/lightnvm/pblk-map.c | 19 +++++++++----- >>>>>> drivers/lightnvm/pblk-read.c | 55 +++++++++++++++++++++++++-------------- >>>>>> drivers/lightnvm/pblk-recovery.c | 34 +++++++++++++++++------- >>>>>> drivers/lightnvm/pblk.h | 18 ++++++++++--- >>>>>> 6 files changed, 143 insertions(+), 53 deletions(-) >>>>>> diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c >>>>>> index a311cc29afd8..d52e0047ae9d 100644 >>>>>> --- a/drivers/lightnvm/pblk-core.c >>>>>> +++ b/drivers/lightnvm/pblk-core.c >>>>>> @@ -250,8 +250,20 @@ int pblk_setup_rqd(struct pblk *pblk, struct nvm_rq *rqd, gfp_t mem_flags, >>>>>> if (!is_vector) >>>>>> return 0; >>>>>> - rqd->ppa_list = rqd->meta_list + pblk_dma_meta_size; >>>>>> - rqd->dma_ppa_list = rqd->dma_meta_list + pblk_dma_meta_size; >>>>>> + if (pblk->dma_shared) { >>>>>> + rqd->ppa_list = rqd->meta_list + pblk->dma_meta_size; >>>>>> + rqd->dma_ppa_list = rqd->dma_meta_list + pblk->dma_meta_size; >>>>>> + >>>>>> + return 0; >>>>>> + } >>>>>> + >>>>>> + rqd->ppa_list = nvm_dev_dma_alloc(dev->parent, mem_flags, >>>>>> + &rqd->dma_ppa_list); >>>>>> + if (!rqd->ppa_list) { >>>>>> + nvm_dev_dma_free(dev->parent, rqd->meta_list, >>>>>> + rqd->dma_meta_list); >>>>>> + return -ENOMEM; >>>>>> + } >>>>>> return 0; >>>>>> } >>>>>> @@ -262,7 +274,11 @@ void pblk_clear_rqd(struct pblk *pblk, struct nvm_rq *rqd) >>>>>> if (rqd->meta_list) >>>>>> nvm_dev_dma_free(dev->parent, rqd->meta_list, >>>>>> - rqd->dma_meta_list); >>>>>> + rqd->dma_meta_list); >>>>>> + >>>>>> + if (!pblk->dma_shared && rqd->ppa_list) >>>>>> + nvm_dev_dma_free(dev->parent, rqd->ppa_list, >>>>>> + rqd->dma_ppa_list); >>>>>> } >>>>>> /* Caller must guarantee that the request is a valid type */ >>>>>> @@ -796,10 +812,12 @@ static int pblk_line_smeta_write(struct pblk *pblk, struct pblk_line *line, >>>>>> rqd.is_seq = 1; >>>>>> for (i = 0; i < lm->smeta_sec; i++, paddr++) { >>>>>> - struct pblk_sec_meta *meta_list = rqd.meta_list; >>>>>> + struct pblk_sec_meta *meta; >>>>>> rqd.ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id); >>>>>> - meta_list[i].lba = lba_list[paddr] = addr_empty; >>>>>> + >>>>>> + meta = sec_meta_index(pblk, rqd.meta_list, i); >>>>>> + meta->lba = lba_list[paddr] = addr_empty; >>>>>> } >>>>>> ret = pblk_submit_io_sync_sem(pblk, &rqd); >>>>>> @@ -845,8 +863,17 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, >>>>>> if (!meta_list) >>>>>> return -ENOMEM; >>>>>> - ppa_list = meta_list + pblk_dma_meta_size; >>>>>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size; >>>>>> + if (pblk->dma_shared) { >>>>>> + ppa_list = meta_list + pblk->dma_meta_size; >>>>>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size; >>>>>> + } else { >>>>>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, >>>>>> + &dma_ppa_list); >>>>>> + if (!ppa_list) { >>>>>> + ret = -ENOMEM; >>>>>> + goto free_meta_list; >>>>>> + } >>>>>> + } >>>>>> next_rq: >>>>>> memset(&rqd, 0, sizeof(struct nvm_rq)); >>>>>> @@ -858,7 +885,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, >>>>>> l_mg->emeta_alloc_type, GFP_KERNEL); >>>>>> if (IS_ERR(bio)) { >>>>>> ret = PTR_ERR(bio); >>>>>> - goto free_rqd_dma; >>>>>> + goto free_ppa_list; >>>>>> } >>>>>> bio->bi_iter.bi_sector = 0; /* internal bio */ >>>>>> @@ -884,7 +911,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, >>>>>> if (pblk_boundary_paddr_checks(pblk, paddr)) { >>>>>> bio_put(bio); >>>>>> ret = -EINTR; >>>>>> - goto free_rqd_dma; >>>>>> + goto free_ppa_list; >>>>>> } >>>>>> ppa = addr_to_gen_ppa(pblk, paddr, line_id); >>>>>> @@ -894,7 +921,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, >>>>>> if (pblk_boundary_paddr_checks(pblk, paddr + min)) { >>>>>> bio_put(bio); >>>>>> ret = -EINTR; >>>>>> - goto free_rqd_dma; >>>>>> + goto free_ppa_list; >>>>>> } >>>>>> for (j = 0; j < min; j++, i++, paddr++) >>>>>> @@ -905,7 +932,7 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, >>>>>> if (ret) { >>>>>> pblk_err(pblk, "emeta I/O submission failed: %d\n", ret); >>>>>> bio_put(bio); >>>>>> - goto free_rqd_dma; >>>>>> + goto free_ppa_list; >>>>>> } >>>>>> atomic_dec(&pblk->inflight_io); >>>>>> @@ -918,8 +945,11 @@ int pblk_line_emeta_read(struct pblk *pblk, struct pblk_line *line, >>>>>> if (left_ppas) >>>>>> goto next_rq; >>>>>> -free_rqd_dma: >>>>>> - nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list); >>>>>> +free_ppa_list: >>>>>> + if (!pblk->dma_shared) >>>>>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list); >>>>>> +free_meta_list: >>>>>> + nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list); >>>>>> return ret; >>>>>> } >>>>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c >>>>>> index a99854439224..57972156c318 100644 >>>>>> --- a/drivers/lightnvm/pblk-init.c >>>>>> +++ b/drivers/lightnvm/pblk-init.c >>>>>> @@ -354,6 +354,20 @@ static int pblk_core_init(struct pblk *pblk) >>>>>> struct nvm_geo *geo = &dev->geo; >>>>>> int ret, max_write_ppas; >>>>>> + if (sizeof(struct pblk_sec_meta) > geo->sos) { >>>>>> + pblk_err(pblk, "OOB area too small. Min %lu bytes (%d)\n", >>>>>> + (unsigned long)sizeof(struct pblk_sec_meta), geo->sos); >>>>>> + return -EINTR; >>>>>> + } >>>>>> + >>>>>> + pblk->dma_ppa_size = (sizeof(u64) * NVM_MAX_VLBA); >>>>>> + pblk->dma_meta_size = geo->sos * NVM_MAX_VLBA; >>>>>> + >>>>>> + if (pblk->dma_ppa_size + pblk->dma_meta_size > PAGE_SIZE) >>>>>> + pblk->dma_shared = false; >>>>>> + else >>>>>> + pblk->dma_shared = true; >>>>>> + >>>>>> atomic64_set(&pblk->user_wa, 0); >>>>>> atomic64_set(&pblk->pad_wa, 0); >>>>>> atomic64_set(&pblk->gc_wa, 0); >>>>>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c >>>>>> index dc0efb852475..55fca16d18e4 100644 >>>>>> --- a/drivers/lightnvm/pblk-map.c >>>>>> +++ b/drivers/lightnvm/pblk-map.c >>>>>> @@ -25,6 +25,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, >>>>>> unsigned int valid_secs) >>>>>> { >>>>>> struct pblk_line *line = pblk_line_get_data(pblk); >>>>>> + struct pblk_sec_meta *meta; >>>>>> struct pblk_emeta *emeta; >>>>>> struct pblk_w_ctx *w_ctx; >>>>>> __le64 *lba_list; >>>>>> @@ -56,6 +57,8 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, >>>>>> /* ppa to be sent to the device */ >>>>>> ppa_list[i] = addr_to_gen_ppa(pblk, paddr, line->id); >>>>>> + meta = sec_meta_index(pblk, meta_list, i); >>>>>> + >>>>>> /* Write context for target bio completion on write buffer. Note >>>>>> * that the write buffer is protected by the sync backpointer, >>>>>> * and a single writer thread have access to each specific entry >>>>>> @@ -67,14 +70,14 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, >>>>>> kref_get(&line->ref); >>>>>> w_ctx = pblk_rb_w_ctx(&pblk->rwb, sentry + i); >>>>>> w_ctx->ppa = ppa_list[i]; >>>>>> - meta_list[i].lba = cpu_to_le64(w_ctx->lba); >>>>>> + meta->lba = cpu_to_le64(w_ctx->lba); >>>>>> lba_list[paddr] = cpu_to_le64(w_ctx->lba); >>>>>> if (lba_list[paddr] != addr_empty) >>>>>> line->nr_valid_lbas++; >>>>>> else >>>>>> atomic64_inc(&pblk->pad_wa); >>>>>> } else { >>>>>> - lba_list[paddr] = meta_list[i].lba = addr_empty; >>>>>> + lba_list[paddr] = meta->lba = addr_empty; >>>>>> __pblk_map_invalidate(pblk, line, paddr); >>>>>> } >>>>>> } >>>>>> @@ -87,7 +90,7 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, >>>>>> unsigned long *lun_bitmap, unsigned int valid_secs, >>>>>> unsigned int off) >>>>>> { >>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list; >>>>>> + struct pblk_sec_meta *meta_list; >>>>>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd); >>>>>> unsigned int map_secs; >>>>>> int min = pblk->min_write_pgs; >>>>>> @@ -95,8 +98,10 @@ void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, >>>>>> for (i = off; i < rqd->nr_ppas; i += min) { >>>>>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min; >>>>>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i); >>>>>> + >>>>>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i], >>>>>> - lun_bitmap, &meta_list[i], map_secs)) { >>>>>> + lun_bitmap, meta_list, map_secs)) { >>>>>> bio_put(rqd->bio); >>>>>> pblk_free_rqd(pblk, rqd, PBLK_WRITE); >>>>>> pblk_pipeline_stop(pblk); >>>>>> @@ -112,8 +117,8 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd, >>>>>> struct nvm_tgt_dev *dev = pblk->dev; >>>>>> struct nvm_geo *geo = &dev->geo; >>>>>> struct pblk_line_meta *lm = &pblk->lm; >>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list; >>>>>> struct ppa_addr *ppa_list = nvm_rq_to_ppa_list(rqd); >>>>>> + struct pblk_sec_meta *meta_list; >>>>>> struct pblk_line *e_line, *d_line; >>>>>> unsigned int map_secs; >>>>>> int min = pblk->min_write_pgs; >>>>>> @@ -121,8 +126,10 @@ void pblk_map_erase_rq(struct pblk *pblk, struct nvm_rq *rqd, >>>>>> for (i = 0; i < rqd->nr_ppas; i += min) { >>>>>> map_secs = (i + min > valid_secs) ? (valid_secs % min) : min; >>>>>> + meta_list = sec_meta_index(pblk, rqd->meta_list, i); >>>>>> + >>>>>> if (pblk_map_page_data(pblk, sentry + i, &ppa_list[i], >>>>>> - lun_bitmap, &meta_list[i], map_secs)) { >>>>>> + lun_bitmap, meta_list, map_secs)) { >>>>>> bio_put(rqd->bio); >>>>>> pblk_free_rqd(pblk, rqd, PBLK_WRITE); >>>>>> pblk_pipeline_stop(pblk); >>>>>> diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c >>>>>> index 57d3155ef9a5..12b690e2abd9 100644 >>>>>> --- a/drivers/lightnvm/pblk-read.c >>>>>> +++ b/drivers/lightnvm/pblk-read.c >>>>>> @@ -42,7 +42,6 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd, >>>>>> struct bio *bio, sector_t blba, >>>>>> unsigned long *read_bitmap) >>>>>> { >>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list; >>>>>> struct ppa_addr ppas[NVM_MAX_VLBA]; >>>>>> int nr_secs = rqd->nr_ppas; >>>>>> bool advanced_bio = false; >>>>>> @@ -51,13 +50,16 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd, >>>>>> pblk_lookup_l2p_seq(pblk, ppas, blba, nr_secs); >>>>>> for (i = 0; i < nr_secs; i++) { >>>>>> + struct pblk_sec_meta *meta; >>>>>> struct ppa_addr p = ppas[i]; >>>>>> sector_t lba = blba + i; >>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i); >>>>>> retry: >>>>>> if (pblk_ppa_empty(p)) { >>>>>> WARN_ON(test_and_set_bit(i, read_bitmap)); >>>>>> - meta_list[i].lba = cpu_to_le64(ADDR_EMPTY); >>>>>> + >>>>>> + meta->lba = cpu_to_le64(ADDR_EMPTY); >>>>>> if (unlikely(!advanced_bio)) { >>>>>> bio_advance(bio, (i) * PBLK_EXPOSED_PAGE_SIZE); >>>>>> @@ -77,7 +79,7 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd, >>>>>> goto retry; >>>>>> } >>>>>> WARN_ON(test_and_set_bit(i, read_bitmap)); >>>>>> - meta_list[i].lba = cpu_to_le64(lba); >>>>>> + meta->lba = cpu_to_le64(lba); >>>>>> advanced_bio = true; >>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG >>>>>> atomic_long_inc(&pblk->cache_reads); >>>>>> @@ -104,12 +106,15 @@ static void pblk_read_ppalist_rq(struct pblk *pblk, struct nvm_rq *rqd, >>>>>> static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd, >>>>>> sector_t blba) >>>>>> { >>>>>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list; >>>>>> int nr_lbas = rqd->nr_ppas; >>>>>> int i; >>>>>> for (i = 0; i < nr_lbas; i++) { >>>>>> - u64 lba = le64_to_cpu(meta_lba_list[i].lba); >>>>>> + struct pblk_sec_meta *meta; >>>>>> + u64 lba; >>>>>> + >>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i); >>>>>> + lba = le64_to_cpu(meta->lba); >>>>>> if (lba == ADDR_EMPTY) >>>>>> continue; >>>>>> @@ -133,17 +138,18 @@ static void pblk_read_check_seq(struct pblk *pblk, struct nvm_rq *rqd, >>>>>> static void pblk_read_check_rand(struct pblk *pblk, struct nvm_rq *rqd, >>>>>> u64 *lba_list, int nr_lbas) >>>>>> { >>>>>> - struct pblk_sec_meta *meta_lba_list = rqd->meta_list; >>>>>> int i, j; >>>>>> for (i = 0, j = 0; i < nr_lbas; i++) { >>>>>> + struct pblk_sec_meta *meta; >>>>>> u64 lba = lba_list[i]; >>>>>> u64 meta_lba; >>>>>> if (lba == ADDR_EMPTY) >>>>>> continue; >>>>>> - meta_lba = le64_to_cpu(meta_lba_list[j].lba); >>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, j); >>>>>> + meta_lba = le64_to_cpu(meta->lba); >>>>>> if (lba != meta_lba) { >>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG >>>>>> @@ -218,7 +224,7 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) >>>>>> struct bio *new_bio = rqd->bio; >>>>>> struct bio *bio = pr_ctx->orig_bio; >>>>>> struct bio_vec src_bv, dst_bv; >>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list; >>>>>> + struct pblk_sec_meta *meta; >>>>>> int bio_init_idx = pr_ctx->bio_init_idx; >>>>>> unsigned long *read_bitmap = pr_ctx->bitmap; >>>>>> int nr_secs = pr_ctx->orig_nr_secs; >>>>>> @@ -237,12 +243,13 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) >>>>>> } >>>>>> /* Re-use allocated memory for intermediate lbas */ >>>>>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size); >>>>>> - lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk_dma_ppa_size); >>>>>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size); >>>>>> + lba_list_media = (((void *)rqd->ppa_list) + 2 * pblk->dma_ppa_size); >>>>>> for (i = 0; i < nr_secs; i++) { >>>>>> - lba_list_media[i] = meta_list[i].lba; >>>>>> - meta_list[i].lba = lba_list_mem[i]; >>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i); >>>>>> + lba_list_media[i] = meta->lba; >>>>>> + meta->lba = lba_list_mem[i]; >>>>>> } >>>>>> /* Fill the holes in the original bio */ >>>>>> @@ -254,7 +261,8 @@ static void pblk_end_partial_read(struct nvm_rq *rqd) >>>>>> line = pblk_ppa_to_line(pblk, rqd->ppa_list[i]); >>>>>> kref_put(&line->ref, pblk_line_put); >>>>>> - meta_list[hole].lba = lba_list_media[i]; >>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, hole); >>>>>> + meta->lba = lba_list_media[i]; >>>>>> src_bv = new_bio->bi_io_vec[i++]; >>>>>> dst_bv = bio->bi_io_vec[bio_init_idx + hole]; >>>>>> @@ -290,8 +298,8 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, >>>>>> unsigned long *read_bitmap, >>>>>> int nr_holes) >>>>>> { >>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list; >>>>>> struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd); >>>>>> + struct pblk_sec_meta *meta; >>>>>> struct pblk_pr_ctx *pr_ctx; >>>>>> struct bio *new_bio, *bio = r_ctx->private; >>>>>> __le64 *lba_list_mem; >>>>>> @@ -299,7 +307,7 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, >>>>>> int i; >>>>>> /* Re-use allocated memory for intermediate lbas */ >>>>>> - lba_list_mem = (((void *)rqd->ppa_list) + pblk_dma_ppa_size); >>>>>> + lba_list_mem = (((void *)rqd->ppa_list) + pblk->dma_ppa_size); >>>>>> new_bio = bio_alloc(GFP_KERNEL, nr_holes); >>>>>> @@ -315,8 +323,10 @@ static int pblk_setup_partial_read(struct pblk *pblk, struct nvm_rq *rqd, >>>>>> if (!pr_ctx) >>>>>> goto fail_free_pages; >>>>>> - for (i = 0; i < nr_secs; i++) >>>>>> - lba_list_mem[i] = meta_list[i].lba; >>>>>> + for (i = 0; i < nr_secs; i++) { >>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i); >>>>>> + lba_list_mem[i] = meta->lba; >>>>>> + } >>>>>> new_bio->bi_iter.bi_sector = 0; /* internal bio */ >>>>>> bio_set_op_attrs(new_bio, REQ_OP_READ, 0); >>>>>> @@ -382,7 +392,7 @@ static int pblk_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd, >>>>>> static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio, >>>>>> sector_t lba, unsigned long *read_bitmap) >>>>>> { >>>>>> - struct pblk_sec_meta *meta_list = rqd->meta_list; >>>>>> + struct pblk_sec_meta *meta; >>>>>> struct ppa_addr ppa; >>>>>> pblk_lookup_l2p_seq(pblk, &ppa, lba, 1); >>>>>> @@ -394,7 +404,10 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio, >>>>>> retry: >>>>>> if (pblk_ppa_empty(ppa)) { >>>>>> WARN_ON(test_and_set_bit(0, read_bitmap)); >>>>>> - meta_list[0].lba = cpu_to_le64(ADDR_EMPTY); >>>>>> + >>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, 0); >>>>>> + meta->lba = cpu_to_le64(ADDR_EMPTY); >>>>>> + >>>>>> return; >>>>>> } >>>>>> @@ -408,7 +421,9 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq *rqd, struct bio *bio, >>>>>> } >>>>>> WARN_ON(test_and_set_bit(0, read_bitmap)); >>>>>> - meta_list[0].lba = cpu_to_le64(lba); >>>>>> + >>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, 0); >>>>>> + meta->lba = cpu_to_le64(lba); >>>>>> #ifdef CONFIG_NVM_PBLK_DEBUG >>>>>> atomic_long_inc(&pblk->cache_reads); >>>>>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c >>>>>> index 8114013c37b8..1ce92562603d 100644 >>>>>> --- a/drivers/lightnvm/pblk-recovery.c >>>>>> +++ b/drivers/lightnvm/pblk-recovery.c >>>>>> @@ -157,7 +157,7 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line, >>>>>> { >>>>>> struct nvm_tgt_dev *dev = pblk->dev; >>>>>> struct nvm_geo *geo = &dev->geo; >>>>>> - struct pblk_sec_meta *meta_list; >>>>>> + struct pblk_sec_meta *meta; >>>>>> struct pblk_pad_rq *pad_rq; >>>>>> struct nvm_rq *rqd; >>>>>> struct bio *bio; >>>>>> @@ -218,8 +218,6 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line, >>>>>> rqd->end_io = pblk_end_io_recov; >>>>>> rqd->private = pad_rq; >>>>>> - meta_list = rqd->meta_list; >>>>>> - >>>>>> for (i = 0; i < rqd->nr_ppas; ) { >>>>>> struct ppa_addr ppa; >>>>>> int pos; >>>>>> @@ -241,8 +239,10 @@ static int pblk_recov_pad_line(struct pblk *pblk, struct pblk_line *line, >>>>>> dev_ppa = addr_to_gen_ppa(pblk, w_ptr, line->id); >>>>>> pblk_map_invalidate(pblk, dev_ppa); >>>>>> - lba_list[w_ptr] = meta_list[i].lba = addr_empty; >>>>>> rqd->ppa_list[i] = dev_ppa; >>>>>> + >>>>>> + meta = sec_meta_index(pblk, rqd->meta_list, i); >>>>>> + lba_list[w_ptr] = meta->lba = addr_empty; >>>>>> } >>>>>> } >>>>>> @@ -327,7 +327,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line, >>>>>> struct nvm_tgt_dev *dev = pblk->dev; >>>>>> struct nvm_geo *geo = &dev->geo; >>>>>> struct ppa_addr *ppa_list; >>>>>> - struct pblk_sec_meta *meta_list; >>>>>> + struct pblk_sec_meta *meta_list, *meta; >>>>>> struct nvm_rq *rqd; >>>>>> struct bio *bio; >>>>>> void *data; >>>>>> @@ -425,7 +425,10 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line, >>>>>> } >>>>>> for (i = 0; i < rqd->nr_ppas; i++) { >>>>>> - u64 lba = le64_to_cpu(meta_list[i].lba); >>>>>> + u64 lba; >>>>>> + >>>>>> + meta = sec_meta_index(pblk, meta_list, i); >>>>>> + lba = le64_to_cpu(meta->lba); >>>>>> lba_list[paddr++] = cpu_to_le64(lba); >>>>>> @@ -464,13 +467,22 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line) >>>>>> if (!meta_list) >>>>>> return -ENOMEM; >>>>>> - ppa_list = (void *)(meta_list) + pblk_dma_meta_size; >>>>>> - dma_ppa_list = dma_meta_list + pblk_dma_meta_size; >>>>>> + if (pblk->dma_shared) { >>>>>> + ppa_list = (void *)(meta_list) + pblk->dma_meta_size; >>>>>> + dma_ppa_list = dma_meta_list + pblk->dma_meta_size; >>>>>> + } else { >>>>>> + ppa_list = nvm_dev_dma_alloc(dev->parent, GFP_KERNEL, >>>>>> + &dma_ppa_list); >>>>>> + if (!ppa_list) { >>>>>> + ret = -ENOMEM; >>>>>> + goto free_meta_list; >>>>>> + } >>>>>> + } >>>>>> data = kcalloc(pblk->max_write_pgs, geo->csecs, GFP_KERNEL); >>>>>> if (!data) { >>>>>> ret = -ENOMEM; >>>>>> - goto free_meta_list; >>>>>> + goto free_ppa_list; >>>>>> } >>>>>> rqd = mempool_alloc(&pblk->r_rq_pool, GFP_KERNEL); >>>>>> @@ -495,9 +507,11 @@ static int pblk_recov_l2p_from_oob(struct pblk *pblk, struct pblk_line *line) >>>>>> out: >>>>>> mempool_free(rqd, &pblk->r_rq_pool); >>>>>> kfree(data); >>>>>> +free_ppa_list: >>>>>> + if (!pblk->dma_shared) >>>>>> + nvm_dev_dma_free(dev->parent, ppa_list, dma_ppa_list); >>>>>> free_meta_list: >>>>>> nvm_dev_dma_free(dev->parent, meta_list, dma_meta_list); >>>>>> - >>>>>> return ret; >>>>>> } >>>>>> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h >>>>>> index 22cc9bfbbb10..4526fee206d9 100644 >>>>>> --- a/drivers/lightnvm/pblk.h >>>>>> +++ b/drivers/lightnvm/pblk.h >>>>>> @@ -86,7 +86,6 @@ enum { >>>>>> }; >>>>>> struct pblk_sec_meta { >>>>>> - u64 reserved; >>>>>> __le64 lba; >>>>>> }; >>>>>> @@ -103,9 +102,6 @@ enum { >>>>>> PBLK_RL_LOW = 4 >>>>>> }; >>>>>> -#define pblk_dma_meta_size (sizeof(struct pblk_sec_meta) * NVM_MAX_VLBA) >>>>>> -#define pblk_dma_ppa_size (sizeof(u64) * NVM_MAX_VLBA) >>>>>> - >>>>>> /* write buffer completion context */ >>>>>> struct pblk_c_ctx { >>>>>> struct list_head list; /* Head for out-of-order completion */ >>>>>> @@ -637,6 +633,10 @@ struct pblk { >>>>>> int sec_per_write; >>>>>> + int dma_meta_size; >>>>>> + int dma_ppa_size; >>>>>> + bool dma_shared; >>>>>> + >>>>>> unsigned char instance_uuid[16]; >>>>>> /* Persistent write amplification counters, 4kb sector I/Os */ >>>>>> @@ -985,6 +985,16 @@ static inline void *emeta_to_vsc(struct pblk *pblk, struct line_emeta *emeta) >>>>>> return (emeta_to_lbas(pblk, emeta) + pblk->lm.emeta_len[2]); >>>>>> } >>>>>> +static inline struct pblk_sec_meta *sec_meta_index(struct pblk *pblk, >>>>>> + struct pblk_sec_meta *meta, >>>>>> + int index) >>>>>> +{ >>>>>> + struct nvm_tgt_dev *dev = pblk->dev; >>>>>> + struct nvm_geo *geo = &dev->geo; >>>>>> + >>>>>> + return ((void *)meta + index * geo->sos); >>>>>> +} >>>>>> + >>>>>> static inline int pblk_line_vsc(struct pblk_line *line) >>>>>> { >>>>>> return le32_to_cpu(*line->vsc); >>>>> >>>>> It will be helpful to split this patch into two: >>>>> >>>>> - One that does the 8b conversion >>>>> - One that makes the change to merge metadata and ppa list data buffers >>>> pblk has always shared the dma buffer for the ppa list and the metadata >>>> buffer. This patch adds the possibility to not merge if the metadata >>>> size does not fit. I can separate it in 2 patches, but it seems to me >>>> like a natural extension when relaxing the 16byte metadata size. >>>>> - How about making it a simplification to only allow up to 32b >>>>> metadata buffers, then one doesn't have to think about it crossing >>>>> multiple pages. >>>> You mean max. 56 bytes of metadata per 4KB right? That is what is left >>>> on a 4KB pages after taking out the 512B needed by the ppa list. It's ok >>>> by me, but I'd like to hear Igor's opinion, as this was Intel's use case >>>> to start with. >>> >>> So I think that if we want to do this properly, we should support >>> 'all' the metadata sizes, including >64K - which is not the case after >>> Javier changes. In the past there were NVMe drives available with 128 >>> bytes of metadata per 4K sector, so it could be similar in future too >>> for OC drives. I have somewhere patch which should work for any size >>> of metadata with slightly different approach. I can send it (just need >>> a moment for rebase) and you will decide which approach you would like >>> to take. >>> >>>>> - You can also make a structure for it, use the first 3.5KB for meta, >>>>> and the next 512B for PPAs. >>>> Looks like a huge structure for just managing a couple of pointers, >>>> don't you think? >>>> Javier >> Matias, >> Can you pick up this patch as is for this window and then Igor can extend >> it to an arbitrary size in the future? We have a use case for 64B OOB / >> 4KB, so 56B / 4KB is not enough in this case. Since pblk currently fixes >> the metadata buffer, the code brakes under this configuration. >> If you have any comments that can respect this requirement, I can apply >> them and resubmit. > > Igor, you said you only needed a moment. It would be better to fix this up the right way, and still make it for this window. > If we can make 4.20 then it is fine by me. Thanks, Javier
Attachment:
signature.asc
Description: Message signed with OpenPGP