On 02.06.22 07:06, Naoya Horiguchi wrote: > From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > > When handling memory error on a hugetlb page, the error handler tries to > dissolve and turn it into 4kB pages. If it's successfully dissolved, > PageHWPoison flag is moved to the raw error page, so that's all right. > However, dissolve sometimes fails, then the error page is left as > hwpoisoned hugepage. It's useful if we can retry to dissolve it to save > healthy pages, but that's not possible now because the information about > where the raw error page is lost. > > Use the private field of a tail page to keep that information. The code > path of shrinking hugepage pool used this info to try delayed dissolve. > This only keeps one hwpoison page for now, which might be OK because it's > simple and multiple hwpoison pages in a hugepage can be rare. But it can > be extended in the future. > > But what would happen now if you have multiple successive MCE events on such a page now? -- Thanks, David / dhildenb