On Wed, Mar 24, 2021 at 09:37:01AM -0700, Dan Williams wrote: > > Eww. As I said I think the right way is that the file system (or > > other consumer) can register a set of callbacks for opening the device. > > How does that solve the problem of the driver being notified of all > pfn failure events? Ok, I probably just showed I need to spend more time looking at your proposal vs the actual code.. Don't we have a proper way how one of the nvdimm layers own a spefific memory range and call directly into that instead of through a notifier? > Today pmem only finds out about the ones that are > notified via native x86 machine check error handling via a notifier > (yes "firmware-first" error handling fails to do the right thing for > the pmem driver), Did any kind of firmware-first error handling ever get anything right? I wish people would have learned that by now. > or the ones that are eventually reported via address > range scrub, but only for the nvdimms that implement range scrubbing. > memory_failure() seems a reasonable catch all point to route pfn > failure events, in an arch independent way, to interested drivers. Yeah. > I'm fine swapping out dax_device blocking_notiier chains for your > proposal, but that does not address all the proposed reworks in my > list which are: > > - delete "drivers/acpi/nfit/mce.c" > > - teach memory_failure() to be able to communicate range failure > > - enable memory_failure() to defer to a filesystem that can say > "critical metadata is impacted, no point in trying to do file-by-file > isolation, bring the whole fs down". This all sounds sensible.