On Mon, Jul 09, 2018 at 10:31:25AM +0800, 裘稀石(稀石) wrote: > Hi Naoya, > > I think the double check can not fix the problem as I said in another email. > If someone mmap before soft offline, so the page_count(head) is still zero > in soft offline, then hwpoison flag set and it can not be alloced again in > dequeue_huge_page_node_exact() during page fault, so page fault return > no-mem, and someone will be killed (not mce kill). > > How about just set hwpoison flag after soft_offline_free_page - > dissolve_free_huge_page > successfully? It will fix the both two problems (mce kill and no-mem kill). Thank you for elaborating, you're right. So do you like a fix like this? --- diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d34225c1cb5b..3c9ce4c05f1b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1479,22 +1479,20 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, /* * Dissolve a given free hugepage into free buddy pages. This function does * nothing for in-use (including surplus) hugepages. Returns -EBUSY if the - * number of free hugepages would be reduced below the number of reserved - * hugepages. + * dissolution fails because a give page is not a free hugepage, or because + * free hugepages are fully reserved. */ int dissolve_free_huge_page(struct page *page) { - int rc = 0; + int rc = -EBUSY; spin_lock(&hugetlb_lock); if (PageHuge(page) && !page_count(page)) { struct page *head = compound_head(page); struct hstate *h = page_hstate(head); int nid = page_to_nid(head); - if (h->free_huge_pages - h->resv_huge_pages == 0) { - rc = -EBUSY; + if (h->free_huge_pages - h->resv_huge_pages == 0) goto out; - } /* * Move PageHWPoison flag from head page to the raw error page, * which makes any subpages rather than the error page reusable. @@ -1508,6 +1506,7 @@ int dissolve_free_huge_page(struct page *page) h->free_huge_pages_node[nid]--; h->max_huge_pages--; update_and_free_page(h, head); + rc = 0; } out: spin_unlock(&hugetlb_lock); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 9d142b9b86dc..e4c7e3ec7b10 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1715,13 +1715,13 @@ static int soft_offline_in_use_page(struct page *page, int flags) static void soft_offline_free_page(struct page *page) { + int rc = 0; struct page *head = compound_head(page); - if (!TestSetPageHWPoison(head)) { + if (PageHuge(head)) + rc = dissolve_free_huge_page(page); + if (!rc && !TestSetPageHWPoison(head)) num_poisoned_pages_inc(); - if (PageHuge(head)) - dissolve_free_huge_page(page); - } } /**