On 06/14/2017 03:12 PM, Mike Kravetz wrote: > On 06/13/2017 02:00 AM, Michal Hocko wrote: >> From: Michal Hocko <mhocko@xxxxxxxx> >> >> alloc_huge_page_nodemask tries to allocate from any numa node in the >> allowed node mask starting from lower numa nodes. This might lead to >> filling up those low NUMA nodes while others are not used. We can reduce >> this risk by introducing a concept of the preferred node similar to what >> we have in the regular page allocator. We will start allocating from the >> preferred nid and then iterate over all allowed nodes in the zonelist >> order until we try them all. >> >> This is mimicking the page allocator logic except it operates on >> per-node mempools. dequeue_huge_page_vma already does this so distill >> the zonelist logic into a more generic dequeue_huge_page_nodemask >> and use it in alloc_huge_page_nodemask. >> >> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> >> --- > > > I built attempts/hugetlb-zonelists, threw it on a test machine, ran the > libhugetlbfs test suite and saw failures. The failures started with this > patch: commit 7e8b09f14495 in your tree. I have not yet started to look > into the failures. It is even possible that the tests are making bad > assumptions, but there certainly appears to be changes in behavior visible > to the application(s). nm. The failures were the result of dequeue_huge_page_nodemask() always returning NULL. Vlastimil already noticed this issue and provided a solution. -- Mike Kravetz > > FYI - My 'test machine' is an x86 KVM insatnce with 8GB memory simulating > 2 nodes. Huge page allocations before running tests: > node0 > 512 free_hugepages > 512 nr_hugepages > 0 surplus_hugepages > node1 > 512 free_hugepages > 512 nr_hugepages > 0 surplus_hugepages > > I can take a closer look at the failures tomorrow. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>