On 3/22/2022 2:01 AM, Christoph Hellwig wrote: > On Sat, Mar 19, 2022 at 12:28:31AM -0600, Jane Chu wrote: >> Introduce DAX_RECOVERY flag to dax_direct_access(). The flag is >> not set by default in dax_direct_access() such that the helper >> does not translate a pmem range to kernel virtual address if the >> range contains uncorrectable errors. When the flag is set, >> the helper ignores the UEs and return kernel virtual adderss so >> that the caller may get on with data recovery via write. > > This DAX_RECOVERY doesn't actually seem to be used anywhere here or > in the subsequent patches. Did I miss something? dax_iomap_iter() uses the flag in the same patch, + if ((map_len == -EIO) && (iov_iter_rw(iter) == WRITE)) { + flags |= DAX_RECOVERY; + map_len = dax_direct_access(dax_dev, pgoff, nrpg, + flags, &kaddr, NULL); > >> Also introduce a new dev_pagemap_ops .recovery_write function. >> The function is applicable to FSDAX device only. The device >> page backend driver provides .recovery_write function if the >> device has underlying mechanism to clear the uncorrectable >> errors on the fly. > > Why is this not in struct dax_operations? Per Dan's comments to the v5 series, adding .recovery_write to dax_operations causes a number of trivial dm targets changes. Dan suggested that adding .recovery_write to pagemap_ops could cut short the logistics of figuring out whether the driver backing up a page is indeed capable of clearing poison. Please see https://lkml.org/lkml/2022/2/4/31 > >> >> +size_t dax_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff, >> + void *addr, size_t bytes, struct iov_iter *iter) >> +{ >> + struct dev_pagemap *pgmap = dax_dev->pgmap; >> + >> + if (!pgmap || !pgmap->ops->recovery_write) >> + return -EIO; >> + return pgmap->ops->recovery_write(pgmap, pgoff, addr, bytes, >> + (void *)iter); > > No need to cast a type pointer to a void pointer. But more importantly > losing the type information here and passing it as void seems very > wrong. include/linux/memremap.h doesn't know struct iov_iter which is defined in include/linux/uio.h, would you prefer to adding include/linux/uio.h to include/linux/memremap.h ? > >> +static size_t pmem_recovery_write(struct dev_pagemap *pgmap, pgoff_t pgoff, >> + void *addr, size_t bytes, void *iter) >> +{ >> + struct pmem_device *pmem = pgmap->owner; >> + >> + dev_warn(pmem->bb.dev, "%s: not yet implemented\n", __func__); >> + >> + /* XXX more later */ >> + return 0; >> +} > > This shuld not be added here - the core code can cope with a NULL > method just fine. Okay, will remove the XXX line. > >> + recov = 0; >> + flags = 0; >> + nrpg = PHYS_PFN(size); > > Please spell out the words. The recovery flag can also be > a bool to make the code more readable. Sure. > >> + map_len = dax_direct_access(dax_dev, pgoff, nrpg, flags, >> + &kaddr, NULL); >> + if ((map_len == -EIO) && (iov_iter_rw(iter) == WRITE)) { > > No need for the inner braces. Okay. > >> + flags |= DAX_RECOVERY; >> + map_len = dax_direct_access(dax_dev, pgoff, nrpg, >> + flags, &kaddr, NULL); > > And noneed for the flags variable at all really. Okay. > >> xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr, >> map_len, iter); >> else >> @@ -1271,6 +1286,11 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi, >> length -= xfer; >> done += xfer; >> >> + if (recov && (xfer == (ssize_t) -EIO)) { >> + pr_warn("dax_recovery_write failed\n"); >> + ret = -EIO; >> + break; > > And no, we can't just use an unsigned variable to communicate a > negative error code. Okay, will have dax_recovery_write return 0 in all error cases. thanks! -jane