Re: [PATCH v5 08/11] mm, memory_failure: Teach memory_failure() about dev_pagemap pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 13, 2018 at 05:28:05PM -0700, Dan Williams wrote:
> On Fri, Jul 13, 2018 at 1:52 AM, Naoya Horiguchi
> <n-horiguchi@xxxxxxxxxxxxx> wrote:
> > On Wed, Jul 04, 2018 at 02:41:06PM -0700, Dan Williams wrote:
...
> >> +
> >> +     /*
> >> +      * Use this flag as an indication that the dax page has been
> >> +      * remapped UC to prevent speculative consumption of poison.
> >> +      */
> >> +     SetPageHWPoison(page);
> >
> > The number of hwpoison pages is maintained by num_poisoned_pages,
> > so you can call num_poisoned_pages_inc()?
> 
> I don't think we want these pages accounted in num_poisoned_pages().
> We have the badblocks infrastructure in libnvdimm to track how many
> errors and where they are located, and since they can be repaired via
> driver actions I think we should track them separately.

OK.

> > Related to this, I'm interested in whether/how unpoison_page() works
> > on a hwpoisoned dev_pagemap page.
> 
> unpoison_page() is only triggered via freeing pages to the page
> allocator, and that never happens for dev_pagemap / ZONE_DEVICE pages.

sorry, my bad comment.
I meant unpoison_memory() in mm/memory-failure.c, which is triggered
via debugfs:hwpoison/unpoison-pfn. This interface looks like below

  int unpoison_memory(unsigned long pfn)
  {
          struct page *page;
          struct page *p;
          int freeit = 0;
          static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
                                          DEFAULT_RATELIMIT_BURST);

          if (!pfn_valid(pfn))
                  return -ENXIO;

          p = pfn_to_page(pfn);
          page = compound_head(p);

          if (!PageHWPoison(p)) {
                  unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
                                   pfn, &unpoison_rs);
                  return 0;
          }
  ...

so I think that we can add is_zone_device_page() check at the beginning
of this function to call hwpoison_clear() introduced in patch 13/13?
Otherwise maybe compound_head() will cause some critical issue like
general protection fault.

Thanks,
Naoya Horiguchi




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux