Re: [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2021/5/6 10:49, HORIGUCHI NAOYA(堀口 直也) wrote:
On Wed, Apr 28, 2021 at 04:54:59PM +0200, David Hildenbrand wrote:
On 21.04.21 04:04, Ding Hui wrote:
Recently we found there is a lot MemFree left in /proc/meminfo after
do a lot of pages soft offline.

I think it's incorrect since NR_FREE_PAGES should not contain HWPoison pages.
After take_page_off_buddy, the page is no longer belong to buddy
allocator, and will not be used any more, but we maybe missed accounting
NR_FREE_PAGES in this situation.

Signed-off-by: Ding Hui <dinghui@xxxxxxxxxxxxxx>
---
   mm/page_alloc.c | 1 +
   1 file changed, 1 insertion(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cfc72873961d..8d65b62784d8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8947,6 +8947,7 @@ bool take_page_off_buddy(struct page *page)
   			del_page_from_free_list(page_head, zone, page_order);
   			break_down_buddy_pages(zone, page_head, page, 0,
   						page_order, migratetype);
+			__mod_zone_page_state(zone, NR_FREE_PAGES, -1);
   			ret = true;
   			break;
   		}


Should this use __mod_zone_freepage_state() instead?

Yes, __mod_zone_freepage_state() looks better to me.

And I think that maybe an additional __mod_zone_freepage_state() in
unpoison_memory() is necessary to cancel the decrement.  I thought of the
following, but it doesn't build because get_pfnblock_migratetype() is
available only in mm/page_alloc.c, so you might want to add a small exported
routine in mm/page_alloc.c and let it called from unpoison_memory().

   @@ -1899,8 +1899,12 @@ int unpoison_memory(unsigned long pfn)
           }
if (!get_hwpoison_page(p, flags, 0)) {
   -               if (TestClearPageHWPoison(p))
   +               if (TestClearPageHWPoison(p)) {
   +                       int migratetype = get_pfnblock_migratetype(p, pfn);
   +
                           num_poisoned_pages_dec();
   +                       __mod_zone_freepage_state(page_zone(p), 1, migratetype);
   +               }
                   unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n",
                                    pfn, &unpoison_rs);
                   return 0;


I think there is another problem:
In normal case, we keep the last refcount of the hwpoison page, so get_hwpoison_page should return 1. The NR_FREE_PAGES will be adjusted when call put_page. At race condition, we maybe leak the page because we does not put it back to buddy in unpoison_memory, however the HWPoison flag, num_poisoned_pages, NR_FREE_PAGES is adjusted correctly.

CPU0                        CPU1

soft_offline_page
  soft_offline_free_page
    page_handle_poison
      take_page_off_buddy
      SetPageHWPoison
                            unpoison_memory
                              if (!get_hwpoison_page(p))
                                TestClearPageHWPoison
                                  num_poisoned_pages_dec
                                __mod_zone_freepage_state
                                return 0
                                /* miss put the page back to buddy */
      page_ref_inc
      num_poisoned_pages_inc

How about do nothing and return -EBUSY (so the caller can retry) if unpoison a zero refcount page , or return 0 like 230ac719c500 ("mm/hwpoison: don't try to unpoison containment-failed pages") does ?

  @@ -1736,11 +1736,9 @@ int unpoison_memory(unsigned long pfn)
    }

    if (!get_hwpoison_page(p, flags, 0)) {
  -       if (TestClearPageHWPoison(p))
  -           num_poisoned_pages_dec();
- unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n", + unpoison_pr_info("Unpoison: Software-unpoisoned zero refcount page %#lx\n",
  				 pfn, &unpoison_rs);
  -       return 0;
  +       return -EBUSY;
    }

    lock_page(page);
	


--
Thanks,
- Ding Hui





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux