Hi Jens, Could you please consider take the v12 series for Linux v5.15 merge window? In this version the full pointer in the on-media data structures are modified to per-namespace offset, and all previous review comments are fixed. There is no more comments for 4 hours, and this series survives in my smoking test for 24+ hours, as an EXPERIMENTAL code the current status is fine IMHO. Thanks in advance. Coly Li On 8/12/21 1:02 AM, Coly Li wrote: > This is the v12 effort for supporting NVDIMM for bcache journal (some > versions may not posted with version numbers). > > The major change of this version is the full pointer of on-media data > structure is replaced by per-namespace offset. Now a pointer address is > calculated by namespace base mapping address + per-namespace offset. > The code logic is same as previous version, all changes are only related > to the base+offset style pointer replacement. > > The nvm-pages allocator is a buddy-like allocator, which allocates size > in power-of-2 pages from the NVDIMM namespace. User space tool 'bcache' > has a new added '-M' option to format a NVDIMM namespace and register it > via sysfs interface as a bcache meta device. The nvm-pages kernel code > does a DAX mapping to map the whole namespace into system's memory > address range, and allocating the pages to requestion like typical buddy > allocator does. The major difference is nvm-pages allocator maintains > the pages allocated to each requester by an allocation list which stored > on NVDIMM too. Allocation list of different requester is tracked by a > pre-defined UUID, all the pages tracked in all allocation lists are > treated as allocated busy pages and won't be initialized into buddy > system after the system reboot. > > The bcache journal code may request a block of power-of-2 size pages > from the nvm-pages allocator, normally it is a range of 256MB or 512MB > continuous pages range. During meta data journaling, the in-memory jsets > go into the calculated nvdimm pages location by kernel memcpy routine. > So the journaling I/Os won't go into block device (e.g. SSD) anymore, > the write and read for journal jsets happen on NVDIMM. > > Intel developers Jianpeng Ma and Qiaowei Ren compose the initial code of > nvm-pages, the related patches are, > - bcache: initialize the nvm pages allocator > - bcache: initialization of the buddy > - bcache: bch_nvm_alloc_pages() of the buddy > - bcache: bch_nvm_free_pages() of the buddy > - bcache: get recs list head for allocated pages by specific uuid > All the code depends on Linux libnvdimm and dax drivers, the bcache nvm- > pages allocator can be treated as user of these two drivers. > > I modify the bcache code to recognize the nvm meta device feature, > initialize journal on NVDIMM, and do journal I/Os on NVDIMM in the > following patches, > - bcache: add initial data structures for nvm pages > - bcache: use bucket index to set GC_MARK_METADATA for journal buckets > in bch_btree_gc_finish() > - bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set > - bcache: initialize bcache journal for NVDIMM meta device > - bcache: support storing bcache journal into NVDIMM meta device > - bcache: read jset from NVDIMM pages for journal replay > - bcache: add sysfs interface register_nvdimm_meta to register NVDIMM > meta device > > In this series, all previously addressed issue via code reviews are all > fixed. And all known issue during testing are fixed. The code survives > from 24+ hours smoking and I/O pressure testing among many reboots, it > works well as expected. > > All the code is EXPERIMENTAL, they won't be enabled by default until we > feel the NVDIMM support is completed and stable. > > Although there are some experts helped to review the code logic, but we > do appreciate if more people may help to review the code. It is quite > common that bcache patches don't have enough code reviewer, but this > time I do need help for more review or comments on this series. > > Thanks in advance. > > Coly Li > --- > > Coly Li (7): > bcache: add initial data structures for nvm pages > bcache: use bucket index to set GC_MARK_METADATA for journal buckets > in bch_btree_gc_finish() > bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set > bcache: initialize bcache journal for NVDIMM meta device > bcache: support storing bcache journal into NVDIMM meta device > bcache: read jset from NVDIMM pages for journal replay > bcache: add sysfs interface register_nvdimm_meta to register NVDIMM > meta device > > Jianpeng Ma (5): > bcache: initialize the nvm pages allocator > bcache: initialization of the buddy > bcache: bch_nvmpg_alloc_pages() of the buddy > bcache: bch_nvmpg_free_pages() of the buddy allocator > bcache: get recs list head for allocated pages by specific uuid > > drivers/md/bcache/Kconfig | 10 + > drivers/md/bcache/Makefile | 1 + > drivers/md/bcache/btree.c | 6 +- > drivers/md/bcache/features.h | 9 + > drivers/md/bcache/journal.c | 325 +++++++++-- > drivers/md/bcache/journal.h | 2 +- > drivers/md/bcache/nvm-pages.c | 931 ++++++++++++++++++++++++++++++++ > drivers/md/bcache/nvm-pages.h | 127 +++++ > drivers/md/bcache/super.c | 53 +- > include/uapi/linux/bcache-nvm.h | 253 +++++++++ > 10 files changed, 1649 insertions(+), 68 deletions(-) > create mode 100644 drivers/md/bcache/nvm-pages.c > create mode 100644 drivers/md/bcache/nvm-pages.h > create mode 100644 include/uapi/linux/bcache-nvm.h >