On Thu, May 06, 2021 at 12:01:34PM +0800, Ding Hui wrote: > On 2021/5/6 10:49, HORIGUCHI NAOYA(堀口 直也) wrote: > > On Wed, Apr 28, 2021 at 04:54:59PM +0200, David Hildenbrand wrote: > > > On 21.04.21 04:04, Ding Hui wrote: > > > > Recently we found there is a lot MemFree left in /proc/meminfo after > > > > do a lot of pages soft offline. > > > > > > > > I think it's incorrect since NR_FREE_PAGES should not contain HWPoison pages. > > > > After take_page_off_buddy, the page is no longer belong to buddy > > > > allocator, and will not be used any more, but we maybe missed accounting > > > > NR_FREE_PAGES in this situation. > > > > > > > > Signed-off-by: Ding Hui <dinghui@xxxxxxxxxxxxxx> > > > > --- > > > > mm/page_alloc.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > index cfc72873961d..8d65b62784d8 100644 > > > > --- a/mm/page_alloc.c > > > > +++ b/mm/page_alloc.c > > > > @@ -8947,6 +8947,7 @@ bool take_page_off_buddy(struct page *page) > > > > del_page_from_free_list(page_head, zone, page_order); > > > > break_down_buddy_pages(zone, page_head, page, 0, > > > > page_order, migratetype); > > > > + __mod_zone_page_state(zone, NR_FREE_PAGES, -1); > > > > ret = true; > > > > break; > > > > } > > > > > > > > > > Should this use __mod_zone_freepage_state() instead? > > > > Yes, __mod_zone_freepage_state() looks better to me. > > > > And I think that maybe an additional __mod_zone_freepage_state() in > > unpoison_memory() is necessary to cancel the decrement. I thought of the > > following, but it doesn't build because get_pfnblock_migratetype() is > > available only in mm/page_alloc.c, so you might want to add a small exported > > routine in mm/page_alloc.c and let it called from unpoison_memory(). > > > > @@ -1899,8 +1899,12 @@ int unpoison_memory(unsigned long pfn) > > } > > if (!get_hwpoison_page(p, flags, 0)) { > > - if (TestClearPageHWPoison(p)) > > + if (TestClearPageHWPoison(p)) { > > + int migratetype = get_pfnblock_migratetype(p, pfn); > > + > > num_poisoned_pages_dec(); > > + __mod_zone_freepage_state(page_zone(p), 1, migratetype); > > + } > > unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n", > > pfn, &unpoison_rs); > > return 0; > > > > I think there is another problem: > In normal case, we keep the last refcount of the hwpoison page, so > get_hwpoison_page should return 1. The NR_FREE_PAGES will be adjusted when > call put_page. I think that take_page_off_buddy() should not be called for this case (the error page have remaining refcount). So it seems that no need to update NR_FREE_PAGES ? > At race condition, we maybe leak the page because we does not put it back to > buddy in unpoison_memory, however the HWPoison flag, num_poisoned_pages, > NR_FREE_PAGES is adjusted correctly. > > CPU0 CPU1 > > soft_offline_page > soft_offline_free_page > page_handle_poison > take_page_off_buddy > SetPageHWPoison > unpoison_memory > if (!get_hwpoison_page(p)) > TestClearPageHWPoison > num_poisoned_pages_dec > __mod_zone_freepage_state > return 0 > /* miss put the page back to buddy */ > page_ref_inc > num_poisoned_pages_inc Thanks for checking this, unpoison_memory() is racy. Recently we are suggesting to introduce mf_mutex by [1]. Although this patch is not merged to mainline yet, but it could be used to prevent the above race too. [1] https://lore.kernel.org/linux-mm/20210427062953.2080293-2-nao.horiguchi@xxxxxxxxx/ > > How about do nothing and return -EBUSY (so the caller can retry) if unpoison > a zero refcount page , or return 0 like 230ac719c500 ("mm/hwpoison: don't > try to unpoison containment-failed pages") does ? > > @@ -1736,11 +1736,9 @@ int unpoison_memory(unsigned long pfn) > } > > if (!get_hwpoison_page(p, flags, 0)) { > - if (TestClearPageHWPoison(p)) > - num_poisoned_pages_dec(); > - unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n", > + unpoison_pr_info("Unpoison: Software-unpoisoned zero refcount page > %#lx\n", > pfn, &unpoison_rs); > - return 0; > + return -EBUSY; Currently unpoison_memory() does not work as reverse operation of take_page_off_buddy() (it's simply broken), so implementing it at one time would be better. I'll take time to fix unpoison_memory(). Thanks, Naoya Horiguchi