Re: [PATCH] mm/memory-failure: remove shake_page()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/26/2024 12:05 PM, Matthew Wilcox wrote:

On Fri, Apr 26, 2024 at 11:53:01AM -0700, Sidhartha Kumar wrote:
On 4/26/24 11:27 AM, Matthew Wilcox wrote:
On Fri, Apr 26, 2024 at 10:57:31AM -0700, Sidhartha Kumar wrote:
On 4/26/24 10:34 AM, Matthew Wilcox wrote:
On Fri, Apr 26, 2024 at 10:15:11AM -0700, Sidhartha Kumar wrote:
Use a folio in get_any_page() to save 5 calls to compound head and
convert the last user of shake_page() to shake_folio(). This allows us
to remove the shake_page() definition.
So I didn't do this before because I wasn't convinced it was safe.
We don't have a refcount on the folio, so the page might no longer
be part of this folio by the time we get the refcount on the folio.

I'd really like to see some argumentation for why this is safe.
If I moved down the folio = page_folio() line to after we verify
__get_hwpoison_page() has returned 1, which indicates the reference count
was successfully incremented via foliO_try_get(), that means the folio
conversion would happen after we have a refcount. In the case we don't call
__get_hwpoison_page(), that means the MF_COUNT_INCREASED flag is set. This
means the page has existing users so that path would be safe as well. So I
think this is safe after moving page_folio() after __get_hwpoison_page().
See if you can find a hole in this chain of reasoning ...

memory_failure()
          p = pfn_to_online_page(pfn);
          res = try_memory_failure_hugetlb(pfn, flags, &hugetlb);
(not a hugetlb)
          if (TestSetPageHWPoison(p)) {
(not already poisoned)
          if (!(flags & MF_COUNT_INCREASED)) {
                  res = get_hwpoison_page(p, flags);

get_hwpoison_page()
                  ret = get_any_page(p, flags);

get_any_page()
	folio = page_folio(page)
That would be unsafe, the safe way would be if we moved page_folio() after
the call to __get_hw_poison() in get_any_page() and there would still be one
remaining user of shake_page() that we can't convert. A safe version of this
patch would result in a removal of one use of PageHuge() and two uses of
put_page(), would that be worth submitting?

get_any_page()
	if(__get_hwpoison_page())
		folio = page_folio() /* folio_try_get() returned 1, safe */
I think we should convert __get_hwpoison_page() to return either the folio
or an ERR_PTR or NULL.  Also, I think we should delete the "cannot catch
tail" part and just loop in __get_hwpoison_page() until we do catch it.
See try_get_folio() in mm/gup.c for inspiration (although you can't use
it exactly because that code knows that the page is mapped into a page
table, so has a refcount).

But that's just an immediate assessment; you might find a reason that
doesn't work.
Besides, in a possible hugetlb demote scenario, it seems to me that we should retry get_hwpoison_hugetlb_folio <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/get_hwpoison_hugetlb_folio>() instead of falling thru to folio_try_get().

staticint__get_hwpoison_page <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/__get_hwpoison_page>(structpage <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/page>*page <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/page>,unsignedlongflags)
{
structfolio <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/folio>*folio <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/folio>=page_folio <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/page_folio>(page <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/page>);
intret=0;
bool <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/bool>hugetlb <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/hugetlb>=false <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/false>;

ret=get_hwpoison_hugetlb_folio <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/get_hwpoison_hugetlb_folio>(folio <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/folio>,&hugetlb <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/hugetlb>,false <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/false>);
if(hugetlb <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/hugetlb>){
/* Make sure hugetlb demotion did not happen from under us. */
if(folio <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/folio>==page_folio <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/page_folio>(page <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/page>))
returnret;
if(ret>0){   <---------  folio changes due to demote
folio_put <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/folio_put>(folio <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/folio>); folio <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/folio>=page_folio <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/page_folio>(page <https://elixir.bootlin.com/linux/v6.9-rc5/C/ident/page>);


thanks,
-jane





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux