On Fri, Nov 20, 2015 at 01:56:18PM -0800, Mike Kravetz wrote: > On 11/19/2015 11:57 PM, Hillf Danton wrote: > >> > >> When dequeue_huge_page_vma() in alloc_huge_page() fails, we fall back to > >> alloc_buddy_huge_page() to directly create a hugepage from the buddy allocator. > >> In that case, however, if alloc_buddy_huge_page() succeeds we don't decrement > >> h->resv_huge_pages, which means that successful hugetlb_fault() returns without > >> releasing the reserve count. As a result, subsequent hugetlb_fault() might fail > >> despite that there are still free hugepages. > >> > >> This patch simply adds decrementing code on that code path. > > In general, I agree with the patch. If we allocate a huge page via the > buddy allocator and that page will be used to satisfy a reservation, then > we need to decrement the reservation count. > > As Hillf mentions, this code is not exactly the same in linux-next. > Specifically, there is the new call to take the memory policy of the > vma into account when calling the buddy allocator. I do not think, > this impacts your proposed change but you may want to test with that > in place. > > >> > >> I reproduced this problem when testing v4.3 kernel in the following situation: > >> - the test machine/VM is a NUMA system, > >> - hugepage overcommiting is enabled, > >> - most of hugepages are allocated and there's only one free hugepage > >> which is on node 0 (for example), > >> - another program, which calls set_mempolicy(MPOL_BIND) to bind itself to > >> node 1, tries to allocate a hugepage, > > I am curious about this scenario. When this second program attempts to > allocate the page, I assume it creates a reservation first. Is this > reservation before or after setting mempolicy? If the mempolicy was set > first, I would have expected the reservation to allocate a page on > node 1 to satisfy the reservation. My testing called set_mempolicy() at first then called mmap(), but things didn't change if I reordered them, because currently hugetlb reservation is not NUMA-aware. Thanks, Naoya Horiguchi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href