4.2.8-ckt2 -stable review patch. If anyone has any objections, please let me know. ---8<------------------------------------------------------------ From: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> commit a88c769548047b21f76fd71e04b6a3300ff17160 upstream. When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back on alloc_buddy_huge_page() to directly create a hugepage from the buddy allocator. In that case, however, if alloc_buddy_huge_page() succeeds we don't decrement h->resv_huge_pages, which means that successful hugetlb_fault() returns without releasing the reserve count. As a result, subsequent hugetlb_fault() might fail despite that there are still free hugepages. This patch simply adds decrementing code on that code path. I reproduced this problem when testing v4.3 kernel in the following situation: - the test machine/VM is a NUMA system, - hugepage overcommiting is enabled, - most of hugepages are allocated and there's only one free hugepage which is on node 0 (for example), - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to node 1, tries to allocate a hugepage, - the allocation should fail but the reserve count is still hold. Signed-off-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> Cc: David Rientjes <rientjes@xxxxxxxxxx> Cc: Dave Hansen <dave.hansen@xxxxxxxxx> Cc: Mel Gorman <mgorman@xxxxxxx> Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> Cc: Hillf Danton <hillf.zj@xxxxxxxxxxxxxxx> Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> [ luis: backported to 3.16: - use 'chg' instead of 'gbl_chg' - adjusted context ] Signed-off-by: Luis Henriques <luis.henriques@xxxxxxxxxxxxx> Signed-off-by: Kamal Mostafa <kamal@xxxxxxxxxxxxx> --- mm/hugetlb.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 62c1ec5..f70850e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1563,7 +1563,10 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma, page = alloc_buddy_huge_page(h, NUMA_NO_NODE); if (!page) goto out_uncharge_cgroup; - + if (!avoid_reserve && vma_has_reserves(vma, chg)) { + SetPagePrivate(page); + h->resv_huge_pages--; + } spin_lock(&hugetlb_lock); list_move(&page->lru, &h->hugepage_activelist); /* Fall through */ -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html