Resend, not sure it didn't go through. On 4/6/2022 10:32 AM, Jane Chu wrote: > On 4/5/2022 10:19 PM, Christoph Hellwig wrote: >> On Tue, Apr 05, 2022 at 01:47:45PM -0600, Jane Chu wrote: >>> Introduce DAX_RECOVERY flag to dax_direct_access(). The flag is >>> not set by default in dax_direct_access() such that the helper >>> does not translate a pmem range to kernel virtual address if the >>> range contains uncorrectable errors. When the flag is set, >>> the helper ignores the UEs and return kernel virtual adderss so >>> that the caller may get on with data recovery via write. >>> >>> Also introduce a new dev_pagemap_ops .recovery_write function. >>> The function is applicable to FSDAX device only. The device >>> page backend driver provides .recovery_write function if the >>> device has underlying mechanism to clear the uncorrectable >>> errors on the fly. >> >> I know Dan suggested it, but I still think dev_pagemap_ops is the very >> wrong choice here. It is about VM callbacks to ZONE_DEVICE owners >> independent of what pagemap type they are. .recovery_write on the >> other hand is completely specific to the DAX write path and has no >> MM interactions at all. > > Yes, I believe Dan was motivated by avoiding the dm dance as a result of > adding .recovery_write to dax_operations. > > I understand your point about .recovery_write is device specific and > thus not something appropriate for device agnostic ops. > > I can see 2 options so far - > > 1) add .recovery_write to dax_operations and do the dm dance to hunt > down to the base device that actually provides the recovery action > > 2) an ugly but expedient approach based on the observation that > dax_direct_access() has already gone through the dm dance and thus could > scoop up the .recovery_write function pointer if DAX_RECOVERY flag is > set. Like bundle action-flag with action, and if should there need more > device specific actions, just add another action with associated flag. > > I'm thinking about something like this > > long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, > long nr_pages, struct daxdev_specific *action, > int flags, void **kaddr, pfn_t *pfn) > > where > struct daxdev_specific { > int flags; /* DAX_RECOVERY, etc */ > size_t (*recovery_write) (pfn_t pfn, pgoff_t pgoff, void *addr, > size_t bytes, void *iter); > } > > __pmem_direct_access() provides the .recovery_write function pointer; > dax_iomap_iter() ends up directly invoke the function in pmem.c > which finds pgmap from pfn_t, and (struct pmem *) from > pgmap->owner; > > In this way, we get rid of dax_recovery_write() interface as well as the > dm dance. > > What do you think? > > Dan, could you also chime in ? > >> >>> /* see "strong" declaration in tools/testing/nvdimm/pmem-dax.c */ >>> __weak long __pmem_direct_access(struct pmem_device *pmem, pgoff_t >>> pgoff, >>> - long nr_pages, void **kaddr, pfn_t *pfn) >>> + long nr_pages, int flags, void **kaddr, pfn_t *pfn) >>> { >>> resource_size_t offset = PFN_PHYS(pgoff) + pmem->data_offset; >>> + sector_t sector = PFN_PHYS(pgoff) >> SECTOR_SHIFT; >>> + unsigned int num = PFN_PHYS(nr_pages) >> SECTOR_SHIFT; >>> + struct badblocks *bb = &pmem->bb; >>> + sector_t first_bad; >>> + int num_bad; >>> + bool bad_in_range; >>> + long actual_nr; >>> + >>> + if (!bb->count) >>> + bad_in_range = false; >>> + else >>> + bad_in_range = !!badblocks_check(bb, sector, num, >>> &first_bad, &num_bad); >>> - if (unlikely(is_bad_pmem(&pmem->bb, PFN_PHYS(pgoff) / 512, >>> - PFN_PHYS(nr_pages)))) >>> + if (bad_in_range && !(flags & DAX_RECOVERY)) >>> return -EIO; >> >> The use of bad_in_range here seems a litle convoluted. See the attached >> patch on how I would structure the function to avoid the variable and >> have the reocvery code in a self-contained chunk. > > Much better, will use your version, thanks! > >> >>> - map_len = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size), >>> - &kaddr, NULL); >>> + nrpg = PHYS_PFN(size); >>> + map_len = dax_direct_access(dax_dev, pgoff, nrpg, 0, &kaddr, >>> NULL); >> >> Overly long line here. > > Okay, will run the checkpatch.pl test again. > > thanks! > -jane