On 04/07/17 11:50, Javier González wrote: > Documentation/lightnvm/pblk.txt | 21 + > drivers/lightnvm/Kconfig | 19 + > drivers/lightnvm/Makefile | 5 + > drivers/lightnvm/pblk-cache.c | 112 +++ > drivers/lightnvm/pblk-core.c | 1641 ++++++++++++++++++++++++++++++++++++++ > drivers/lightnvm/pblk-gc.c | 542 +++++++++++++ > drivers/lightnvm/pblk-init.c | 942 ++++++++++++++++++++++ > drivers/lightnvm/pblk-map.c | 135 ++++ > drivers/lightnvm/pblk-rb.c | 859 ++++++++++++++++++++ > drivers/lightnvm/pblk-read.c | 513 ++++++++++++ > drivers/lightnvm/pblk-recovery.c | 1007 +++++++++++++++++++++++ > drivers/lightnvm/pblk-rl.c | 182 +++++ > drivers/lightnvm/pblk-sysfs.c | 500 ++++++++++++ > drivers/lightnvm/pblk-write.c | 408 ++++++++++ > drivers/lightnvm/pblk.h | 1127 ++++++++++++++++++++++++++ > include/linux/lightnvm.h | 57 +- > pblk-sysfs.c | 581 ++++++++++++++ This patch introduces two slightly different versions of pblk-sysfs.c - one at the top level and one in drivers/lightnvm. Please remove the file at the top level. > +config NVM_PBLK_L2P_OPT > + bool "PBLK with optimized L2P table for devices up to 8TB" > + depends on NVM_PBLK > + ---help--- > + Physical device addresses are typical 64-bit integers. Since we store > + the logical to physical (L2P) table on the host, this takes 1/500 of > + host memory (e.g., 2GB per 1TB of storage media). On drives under 8TB, > + it is possible to reduce this to 1/1000 (e.g., 1GB per 1TB). This > + option allows to do this optimization on the host L2P table. Why is NVM_PBLK_L2P_OPT a compile-time option instead of a run-time option? Since this define does not affect the definition of the ppa_addr I don't see why this has to be a compile-time option. For e.g. Linux distributors the only choice would be to disable NVM_PBLK_L2P_OPT. I think it would be unfortunate if no Linux distribution ever would be able to benefit from this optimization. > +#ifdef CONFIG_NVM_DEBUG > + atomic_add(nr_entries, &pblk->inflight_writes); > + atomic_add(nr_entries, &pblk->req_writes); > +#endif Has it been considered to use the "static key" feature such that consistency checks can be enabled at run-time instead of having to rebuild the kernel to enable CONFIG_NVM_DEBUG? > +#ifdef CONFIG_NVM_DEBUG > + BUG_ON(nr_rec_entries != valid_entries); > + atomic_add(valid_entries, &pblk->inflight_writes); > + atomic_add(valid_entries, &pblk->recov_gc_writes); > +#endif Are you aware that Linus is strongly opposed against using BUG_ON()? > +#ifdef CONFIG_NVM_DEBUG > + lockdep_assert_held(&l_mg->free_lock); > +#endif Why is lockdep_assert_held() surrounded with #ifdef CONFIG_NVM_DEBUG / #endif? Are you aware that lockdep_assert_held() do not generate any code with CONFIG_PROVE_LOCKING=n? > +static const struct block_device_operations pblk_fops = { > + .owner = THIS_MODULE, > +}; Is this data structure useful? If so, where is pblk_fops used? > +static void pblk_l2p_free(struct pblk *pblk) > +{ > + vfree(pblk->trans_map); > +} > + > +static int pblk_l2p_init(struct pblk *pblk) > +{ > + sector_t i; > + > + pblk->trans_map = vmalloc(sizeof(pblk_addr) * pblk->rl.nr_secs); > + if (!pblk->trans_map) > + return -ENOMEM; > + > + for (i = 0; i < pblk->rl.nr_secs; i++) > + pblk_ppa_set_empty(&pblk->trans_map[i]); > + > + return 0; > +} Has it been considered to add support for keeping a subset of the L2P translation table in memory instead of keeping it in memory in its entirety? > + sprintf(cache_name, "pblk_line_m_%s", pblk->disk->disk_name); Please use snprintf() or kasprintf() instead of printf(). That makes it easier for humans to verify that no buffer overflow is triggered. > +/* physical block device target */ > +static struct nvm_tgt_type tt_pblk = { > + .name = "pblk", > + .version = {1, 0, 0}, Are version numbers useful inside the kernel tree? > +void pblk_map_rq(struct pblk *pblk, struct nvm_rq *rqd, unsigned int sentry, > + unsigned long *lun_bitmap, unsigned int valid_secs, > + unsigned int off) > +{ > + struct pblk_sec_meta *meta_list = rqd->meta_list; > + unsigned int map_secs; > + int min = pblk->min_write_pgs; > + int i; > + > + for (i = off; i < rqd->nr_ppas; i += min) { > + map_secs = (i + min > valid_secs) ? (valid_secs % min) : min; > + pblk_map_page_data(pblk, sentry + i, &rqd->ppa_list[i], > + lun_bitmap, &meta_list[i], map_secs); > + } > +} Has it been considered to implement the above code such that no modulo (%) computation is needed, which is a relatively expensive operation? I think for this loop that can be done easily. > +static DECLARE_RWSEM(pblk_rb_lock); > + > +void pblk_rb_data_free(struct pblk_rb *rb) > +{ > + struct pblk_rb_pages *p, *t; > + > + down_write(&pblk_rb_lock); > + list_for_each_entry_safe(p, t, &rb->pages, list) { > + free_pages((unsigned long)page_address(p->pages), p->order); > + list_del(&p->list); > + kfree(p); > + } > + up_write(&pblk_rb_lock); > +} Global locks like pblk_rb_lock are a performance bottleneck on multi-socket systems. Why is that lock global? Bart.