On Fri, Jul 13, 2018 at 05:28:05PM -0700, Dan Williams wrote: > On Fri, Jul 13, 2018 at 1:52 AM, Naoya Horiguchi > <n-horiguchi@xxxxxxxxxxxxx> wrote: > > On Wed, Jul 04, 2018 at 02:41:06PM -0700, Dan Williams wrote: ... > >> + > >> + /* > >> + * Use this flag as an indication that the dax page has been > >> + * remapped UC to prevent speculative consumption of poison. > >> + */ > >> + SetPageHWPoison(page); > > > > The number of hwpoison pages is maintained by num_poisoned_pages, > > so you can call num_poisoned_pages_inc()? > > I don't think we want these pages accounted in num_poisoned_pages(). > We have the badblocks infrastructure in libnvdimm to track how many > errors and where they are located, and since they can be repaired via > driver actions I think we should track them separately. OK. > > Related to this, I'm interested in whether/how unpoison_page() works > > on a hwpoisoned dev_pagemap page. > > unpoison_page() is only triggered via freeing pages to the page > allocator, and that never happens for dev_pagemap / ZONE_DEVICE pages. sorry, my bad comment. I meant unpoison_memory() in mm/memory-failure.c, which is triggered via debugfs:hwpoison/unpoison-pfn. This interface looks like below int unpoison_memory(unsigned long pfn) { struct page *page; struct page *p; int freeit = 0; static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); if (!pfn_valid(pfn)) return -ENXIO; p = pfn_to_page(pfn); page = compound_head(p); if (!PageHWPoison(p)) { unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n", pfn, &unpoison_rs); return 0; } ... so I think that we can add is_zone_device_page() check at the beginning of this function to call hwpoison_clear() introduced in patch 13/13? Otherwise maybe compound_head() will cause some critical issue like general protection fault. Thanks, Naoya Horiguchi