Re: [PATCH v12 00/12] bcache: support NVDIMM for journaling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jens,

Could you please consider take the v12 series for Linux v5.15 merge window?

In this version the full pointer in the on-media data structures are
modified to per-namespace offset, and all previous review comments are
fixed.
There is no more comments for 4 hours, and this series survives in my
smoking test for 24+ hours, as an EXPERIMENTAL code the current status
is fine IMHO.

Thanks in advance.

Coly Li


On 8/12/21 1:02 AM, Coly Li wrote:
> This is the v12 effort for supporting NVDIMM for bcache journal (some
> versions may not posted with version numbers).
>
> The major change of this version is the full pointer of on-media data
> structure is replaced by per-namespace offset. Now a pointer address is
> calculated by namespace base mapping address + per-namespace offset.
> The code logic is same as previous version, all changes are only related
> to the base+offset style pointer replacement.
>
> The nvm-pages allocator is a buddy-like allocator, which allocates size
> in power-of-2 pages from the NVDIMM namespace. User space tool 'bcache'
> has a new added '-M' option to format a NVDIMM namespace and register it
> via sysfs interface as a bcache meta device. The nvm-pages kernel code
> does a DAX mapping to map the whole namespace into system's memory
> address range, and allocating the pages to requestion like typical buddy
> allocator does. The major difference is nvm-pages allocator maintains
> the pages allocated to each requester by an allocation list which stored
> on NVDIMM too. Allocation list of different requester is tracked by a
> pre-defined UUID, all the pages tracked in all allocation lists are
> treated as allocated busy pages and won't be initialized into buddy
> system after the system reboot.
>
> The bcache journal code may request a block of power-of-2 size pages
> from the nvm-pages allocator, normally it is a range of 256MB or 512MB
> continuous pages range. During meta data journaling, the in-memory jsets
> go into the calculated nvdimm pages location by kernel memcpy routine.
> So the journaling I/Os won't go into block device (e.g. SSD) anymore, 
> the write and read for journal jsets happen on NVDIMM.
>
> Intel developers Jianpeng Ma and Qiaowei Ren compose the initial code of
> nvm-pages, the related patches are,
> - bcache: initialize the nvm pages allocator
> - bcache: initialization of the buddy
> - bcache: bch_nvm_alloc_pages() of the buddy
> - bcache: bch_nvm_free_pages() of the buddy
> - bcache: get recs list head for allocated pages by specific uuid
> All the code depends on Linux libnvdimm and dax drivers, the bcache nvm-
> pages allocator can be treated as user of these two drivers.
>
> I modify the bcache code to recognize the nvm meta device feature,
> initialize journal on NVDIMM, and do journal I/Os on NVDIMM in the
> following patches,
> - bcache: add initial data structures for nvm pages
> - bcache: use bucket index to set GC_MARK_METADATA for journal buckets
>   in bch_btree_gc_finish()
> - bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
> - bcache: initialize bcache journal for NVDIMM meta device
> - bcache: support storing bcache journal into NVDIMM meta device
> - bcache: read jset from NVDIMM pages for journal replay
> - bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
>   meta device
>
> In this series, all previously addressed issue via code reviews are all
> fixed. And all known issue during testing are fixed. The code survives
> from 24+ hours smoking and I/O pressure testing among many reboots, it
> works well as expected.
>
> All the code is EXPERIMENTAL, they won't be enabled by default until we
> feel the NVDIMM support is completed and stable.
>
> Although there are some experts helped to review the code logic, but we
> do appreciate if more people may help to review the code. It is quite
> common that bcache patches don't have enough code reviewer, but this
> time I do need help for more review or comments on this series.
>
> Thanks in advance.
>
> Coly Li
> ---
>
> Coly Li (7):
>   bcache: add initial data structures for nvm pages
>   bcache: use bucket index to set GC_MARK_METADATA for journal buckets
>     in bch_btree_gc_finish()
>   bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set
>   bcache: initialize bcache journal for NVDIMM meta device
>   bcache: support storing bcache journal into NVDIMM meta device
>   bcache: read jset from NVDIMM pages for journal replay
>   bcache: add sysfs interface register_nvdimm_meta to register NVDIMM
>     meta device
>
> Jianpeng Ma (5):
>   bcache: initialize the nvm pages allocator
>   bcache: initialization of the buddy
>   bcache: bch_nvmpg_alloc_pages() of the buddy
>   bcache: bch_nvmpg_free_pages() of the buddy allocator
>   bcache: get recs list head for allocated pages by specific uuid
>
>  drivers/md/bcache/Kconfig       |  10 +
>  drivers/md/bcache/Makefile      |   1 +
>  drivers/md/bcache/btree.c       |   6 +-
>  drivers/md/bcache/features.h    |   9 +
>  drivers/md/bcache/journal.c     | 325 +++++++++--
>  drivers/md/bcache/journal.h     |   2 +-
>  drivers/md/bcache/nvm-pages.c   | 931 ++++++++++++++++++++++++++++++++
>  drivers/md/bcache/nvm-pages.h   | 127 +++++
>  drivers/md/bcache/super.c       |  53 +-
>  include/uapi/linux/bcache-nvm.h | 253 +++++++++
>  10 files changed, 1649 insertions(+), 68 deletions(-)
>  create mode 100644 drivers/md/bcache/nvm-pages.c
>  create mode 100644 drivers/md/bcache/nvm-pages.h
>  create mode 100644 include/uapi/linux/bcache-nvm.h
>




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux