On Tue 05-06-18 07:33:17, Dan Williams wrote: > On Tue, Jun 5, 2018 at 7:11 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > On Mon 04-06-18 07:31:25, Dan Williams wrote: > > [...] > >> I'm trying to solve this real world problem when real poison is > >> consumed through a dax mapping: > >> > >> mce: Uncorrected hardware memory error in user-access at af34214200 > >> {1}[Hardware Error]: It has been corrected by h/w and requires > >> no further action > >> mce: [Hardware Error]: Machine check events logged > >> {1}[Hardware Error]: event severity: corrected > >> Memory failure: 0xaf34214: reserved kernel page still > >> referenced by 1 users > >> [..] > >> Memory failure: 0xaf34214: recovery action for reserved kernel > >> page: Failed > >> mce: Memory error not recovered > >> > >> ...i.e. currently all poison consumed through dax mappings is > >> needlessly system fatal. > > > > Thanks. That should be a part of the changelog. > > ...added for v3: > https://lists.01.org/pipermail/linux-nvdimm/2018-June/016153.html > > > It would be great to > > describe why this cannot be simply handled by hwpoison code without any > > ZONE_DEVICE specific hacks? The error is recoverable so why does > > hwpoison code even care? > > > > Up until we started testing hardware poison recovery for persistent > memory I assumed that the kernel did not need any new enabling to get > basic support for recovering userspace consumed poison. > > However, the recovery code has a dedicated path for many different > page states (see: action_page_types). Without any changes it > incorrectly assumes that a dax mapped page is a page cache page > undergoing dma, or some other pinned operation. It also assumes that > the page must be offlined which is not correct / possible for dax > mapped pages. There is a possibility to repair poison to dax mapped > persistent memory pages, and the pages can't otherwise be offlined > because they 1:1 correspond with a physical storage block, i.e. > offlining pmem would be equivalent to punching a hole in the physical > address space. > > There's also the entanglement of device-dax which guarantees a given > mapping size (4K, 2M, 1G). This requires determining the size of the > mapping encompassing a given pfn to know how much to unmap. Since dax > mapped pfns don't come from the page allocator we need to read the > page size from the page tables, not compound_order(page). OK, but my question is still. Do we really want to do more on top of the existing code and add even more special casing or it is time to rethink the whole hwpoison design? -- Michal Hocko SUSE Labs