On Mon 04-06-18 07:31:25, Dan Williams wrote: [...] > I'm trying to solve this real world problem when real poison is > consumed through a dax mapping: > > mce: Uncorrected hardware memory error in user-access at af34214200 > {1}[Hardware Error]: It has been corrected by h/w and requires > no further action > mce: [Hardware Error]: Machine check events logged > {1}[Hardware Error]: event severity: corrected > Memory failure: 0xaf34214: reserved kernel page still > referenced by 1 users > [..] > Memory failure: 0xaf34214: recovery action for reserved kernel > page: Failed > mce: Memory error not recovered > > ...i.e. currently all poison consumed through dax mappings is > needlessly system fatal. Thanks. That should be a part of the changelog. It would be great to describe why this cannot be simply handled by hwpoison code without any ZONE_DEVICE specific hacks? The error is recoverable so why does hwpoison code even care? -- Michal Hocko SUSE Labs