On Fri, Jun 08, 2018 at 04:51:19PM -0700, Dan Williams wrote: > mce: Uncorrected hardware memory error in user-access at af34214200 > {1}[Hardware Error]: It has been corrected by h/w and requires no further action > mce: [Hardware Error]: Machine check events logged > {1}[Hardware Error]: event severity: corrected > Memory failure: 0xaf34214: reserved kernel page still referenced by 1 users > [..] > Memory failure: 0xaf34214: recovery action for reserved kernel page: Failed > mce: Memory error not recovered > > In contrast to typical memory, dev_pagemap pages may be dax mapped. With > dax there is no possibility to map in another page dynamically since dax > establishes 1:1 physical address to file offset associations. Also > dev_pagemap pages associated with NVDIMM / persistent memory devices can > internal remap/repair addresses with poison. While memory_failure() > assumes that it can discard typical poisoned pages and keep them > unmapped indefinitely, dev_pagemap pages may be returned to service > after the error is cleared. > > Teach memory_failure() to detect and handle MEMORY_DEVICE_HOST > dev_pagemap pages that have poison consumed by userspace. Mark the > memory as UC instead of unmapping it completely to allow ongoing access > via the device driver (nd_pmem). Later, nd_pmem will grow support for > marking the page back to WB when the error is cleared. > > Cc: Jan Kara <jack@xxxxxxx> > Cc: Christoph Hellwig <hch@xxxxxx> > Cc: Jérôme Glisse <jglisse@xxxxxxxxxx> > Cc: Matthew Wilcox <mawilcox@xxxxxxxxxxxxx> > Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> > Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx> > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > --- <> > +static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > + struct dev_pagemap *pgmap) > +{ > + const bool unmap_success = true; > + unsigned long size; > + struct page *page; > + LIST_HEAD(tokill); > + int rc = -EBUSY; > + loff_t start; > + > + /* > + * Prevent the inode from being freed while we are interrogating > + * the address_space, typically this would be handled by > + * lock_page(), but dax pages do not use the page lock. > + */ > + page = dax_lock_page(pfn); > + if (!page) > + goto out; > + > + if (hwpoison_filter(page)) { > + rc = 0; > + goto unlock; > + } > + > + switch (pgmap->type) { > + case MEMORY_DEVICE_PRIVATE: > + case MEMORY_DEVICE_PUBLIC: > + /* > + * TODO: Handle HMM pages which may need coordination > + * with device-side memory. > + */ > + goto unlock; > + default: > + break; > + } > + > + /* > + * If the page is not mapped in userspace then report it as > + * unhandled. > + */ > + size = dax_mapping_size(page); > + if (!size) { > + pr_err("Memory failure: %#lx: failed to unmap page\n", pfn); > + goto unlock; > + } > + > + SetPageHWPoison(page); > + > + /* > + * Unlike System-RAM there is no possibility to swap in a > + * different physical page at a given virtual address, so all > + * userspace consumption of ZONE_DEVICE memory necessitates > + * SIGBUS (i.e. MF_MUST_KILL) > + */ > + flags |= MF_ACTION_REQUIRED | MF_MUST_KILL; > + collect_procs(page, &tokill, flags & MF_ACTION_REQUIRED); You know "flags & MF_ACTION_REQUIRED" will always be true, so you can just pass in MF_ACTION_REQUIRED or even just "true". > + > + start = (page->index << PAGE_SHIFT) & ~(size - 1); > + unmap_mapping_range(page->mapping, start, start + size, 0); > + > + kill_procs(&tokill, flags & MF_MUST_KILL, !unmap_success, ilog2(size), You know "flags & MF_MUST_KILL" will always be true, so you can just pass in MF_MUST_KILL or even just "true". Also, you can get rid of the constant "unmap_success" if you want and just pass in false as the 3rd argument.