On Fri, Jan 28, 2022 at 1:32 PM Jane Chu <jane.chu@xxxxxxxxxx> wrote: > > dax_recovery_write() dax op is only required for DAX device that > export DAXDEV_RECOVERY indicating its capability to recover from > poisons. > > DM may be nested, if part of the base dax devices forming a DM > device support dax recovery, the DM device is marked with such > capability. > > Signed-off-by: Jane Chu <jane.chu@xxxxxxxxxx> [..] > diff --git a/include/linux/dax.h b/include/linux/dax.h > index 2fc776653c6e..1b3d6ebf3e49 100644 > --- a/include/linux/dax.h > +++ b/include/linux/dax.h > @@ -30,6 +30,9 @@ struct dax_operations { > sector_t, sector_t); > /* zero_page_range: required operation. Zero page range */ > int (*zero_page_range)(struct dax_device *, pgoff_t, size_t); > + /* recovery_write: optional operation. */ > + size_t (*recovery_write)(struct dax_device *, pgoff_t, void *, size_t, > + struct iov_iter *); The removal of the ->copy_{to,from}_iter() operations set the precedent that dax ops should not be needed when the operation can be carried out generically. The only need to call back to the pmem driver is so that it can call nvdimm_clear_poison(). nvdimm_clear_poison() in turn only needs the 'struct device' hosting the pmem and the physical address to be cleared. The physical address is already returned by dax_direct_access(). The device is something that could be added to dax_device, and the pgmap could host the callback that pmem fills in. Something like: diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 58eda16f5c53..36486ba4753a 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -694,6 +694,7 @@ static int __nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap) .end = nsio->res.end - end_trunc, }; pgmap->nr_range = 1; + pgmap->owner = &nd_pfn->dev; if (nd_pfn->mode == PFN_MODE_RAM) { if (offset < reserve) return -EINVAL; diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 58d95242a836..95e1b6326f88 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -481,6 +481,7 @@ static int pmem_attach_disk(struct device *dev, } set_dax_nocache(dax_dev); set_dax_nomc(dax_dev); + set_dax_pgmap(dax_dev, &pmem->pgmap); if (is_nvdimm_sync(nd_region)) set_dax_synchronous(dax_dev); rc = dax_add_host(dax_dev, disk); diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 1fafcc38acba..8cb59b5df38b 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -81,6 +81,11 @@ struct dev_pagemap_ops { #define PGMAP_ALTMAP_VALID (1 << 0) +struct dev_pagemap_operations { + size_t (*recovery_write)(struct dev_pagemap *pgmap, void *, size_t, + struct iov_iter *); +}; + /** * struct dev_pagemap - metadata for ZONE_DEVICE mappings * @altmap: pre-allocated/reserved memory for vmemmap allocations @@ -111,12 +116,15 @@ struct dev_pagemap { const struct dev_pagemap_ops *ops; void *owner; int nr_range; + struct dev_pagemap_operations ops; union { struct range range; struct range ranges[0]; }; }; ...then DM does not need to be involved in the recovery path, fs/dax.c just does dax_direct_access(..., DAX_RECOVERY, ...) and then looks up the pgmap to generically coordinate the recovery_write(). The pmem driver would be responsible for setting pgmap->recovery_write() to a function that calls nvdimm_clear_poison(). This arch works for anything that can be described by a pgmap, and supports error clearing, it need not be limited to the pmem block driver.