On Wed, Feb 24, 2021 at 11:47:49AM +0800, Muchun Song wrote: > I have been looking at the dequeue_huge_page_node_exact(). > If a PageHWPoison huge page is in the free pool list, the page will > not be allocated to the user. The PageHWPoison huge page > will be skip in the dequeue_huge_page_node_exact(). Yes, now I see where the problem lies. hugetlb_no_page()->..->dequeue_huge_page_node_exact() will fail if the only page in the pool is hwpoisoned, as expected. Then alloc_buddy_huge_page_with_mpol() will be tried, but since surplus_huge_pages counter is stale, we will fail there. That relates to the problem Mike pointed out, that we should decrease again the surplus_huge_pages. I think hwpoisoned pages should not be in the free pool though. Probably we want to take them off when we notice we have one: e.g: dequeue_huge_page_node_exact could place the page in another list and place it back in case it was unpoisoned. But anyway, that has nothing to do with this (apart from the surplus problem). -- Oscar Salvador SUSE L3