On Wed, Mar 24, 2021 at 10:39 AM Christoph Hellwig <hch@xxxxxx> wrote: > > On Wed, Mar 24, 2021 at 09:37:01AM -0700, Dan Williams wrote: > > > Eww. As I said I think the right way is that the file system (or > > > other consumer) can register a set of callbacks for opening the device. > > > > How does that solve the problem of the driver being notified of all > > pfn failure events? > > Ok, I probably just showed I need to spend more time looking at > your proposal vs the actual code.. > > Don't we have a proper way how one of the nvdimm layers own a > spefific memory range and call directly into that instead of through > a notifier? So that could be a new dev_pagemap operation as Ruan has here. I was thinking that other agents would be interested in non-dev_pagemap managed ranges, but we could leave that for later and just make the current pgmap->memory_failure() callback proposal range based. > > > Today pmem only finds out about the ones that are > > notified via native x86 machine check error handling via a notifier > > (yes "firmware-first" error handling fails to do the right thing for > > the pmem driver), > > Did any kind of firmware-first error handling ever get anything > right? I wish people would have learned that by now. Part of me wants to say if you use firmware-first you get to keep the pieces, but it's not always the end user choice as far as I understand. > > or the ones that are eventually reported via address > > range scrub, but only for the nvdimms that implement range scrubbing. > > memory_failure() seems a reasonable catch all point to route pfn > > failure events, in an arch independent way, to interested drivers. > > Yeah. > > > I'm fine swapping out dax_device blocking_notiier chains for your > > proposal, but that does not address all the proposed reworks in my > > list which are: > > > > - delete "drivers/acpi/nfit/mce.c" > > > > - teach memory_failure() to be able to communicate range failure > > > > - enable memory_failure() to defer to a filesystem that can say > > "critical metadata is impacted, no point in trying to do file-by-file > > isolation, bring the whole fs down". > > This all sounds sensible. Ok, Ruan, I think this means rework your dev_pagemap_ops callback to be range based. Add a holder concept for dax_devices and then layer that on Christoph's eventual dax_device callback mechanism that a dax_device holder can register.