On Mon, Jun 4, 2018 at 5:40 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote: > On Sat 02-06-18 22:22:43, Dan Williams wrote: >> Changes since v1 [1]: >> * Rework the locking to not use lock_page() instead use a combination of >> rcu_read_lock(), xa_lock_irq(&mapping->pages), and igrab() to validate >> that dax pages are still associated with the given mapping, and to >> prevent the address_space from being freed while memory_failure() is >> busy. (Jan) >> >> * Fix use of MF_COUNT_INCREASED in madvise_inject_error() to account for >> the case where the injected error is a dax mapping and the pinned >> reference needs to be dropped. (Naoya) >> >> * Clarify with a comment that VM_FAULT_NOPAGE may not always indicate a >> mapping of the storage capacity, it could also indicate the zero page. >> (Jan) >> >> [1]: https://lists.01.org/pipermail/linux-nvdimm/2018-May/015932.html >> >> --- >> >> As it stands, memory_failure() gets thoroughly confused by dev_pagemap >> backed mappings. The recovery code has specific enabling for several >> possible page states and needs new enabling to handle poison in dax >> mappings. >> >> In order to support reliable reverse mapping of user space addresses: >> >> 1/ Add new locking in the memory_failure() rmap path to prevent races >> that would typically be handled by the page lock. >> >> 2/ Since dev_pagemap pages are hidden from the page allocator and the >> "compound page" accounting machinery, add a mechanism to determine the >> size of the mapping that encompasses a given poisoned pfn. >> >> 3/ Given pmem errors can be repaired, change the speculatively accessed >> poison protection, mce_unmap_kpfn(), to be reversible and otherwise >> allow ongoing access from the kernel. > > This doesn't really describe the problem you are trying to solve and why > do you believe that HWPoison is the best way to handle it. As things > stand HWPoison is rather ad-hoc and I am not sure adding more to it is > really great without some deep reconsidering how the whole thing is done > right now IMHO. Are you actually trying to solve some real world problem > or you merely want to make soft offlining work properly? I'm trying to solve this real world problem when real poison is consumed through a dax mapping: mce: Uncorrected hardware memory error in user-access at af34214200 {1}[Hardware Error]: It has been corrected by h/w and requires no further action mce: [Hardware Error]: Machine check events logged {1}[Hardware Error]: event severity: corrected Memory failure: 0xaf34214: reserved kernel page still referenced by 1 users [..] Memory failure: 0xaf34214: recovery action for reserved kernel page: Failed mce: Memory error not recovered ...i.e. currently all poison consumed through dax mappings is needlessly system fatal.