On 8/5/19 2:28 AM, Vlastimil Babka wrote: > On 8/3/19 12:39 AM, Mike Kravetz wrote: >> When allocating hugetlbfs pool pages via /proc/sys/vm/nr_hugepages, >> the pages will be interleaved between all nodes of the system. If >> nodes are not equal, it is quite possible for one node to fill up >> before the others. When this happens, the code still attempts to >> allocate pages from the full node. This results in calls to direct >> reclaim and compaction which slow things down considerably. >> >> When allocating pool pages, note the state of the previous allocation >> for each node. If previous allocation failed, do not use the >> aggressive retry algorithm on successive attempts. The allocation >> will still succeed if there is memory available, but it will not try >> as hard to free up memory. >> >> Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > > Looks like only part of the (agreed with) suggestions were implemented? My bad, I pulled in the wrong patch. > - set_max_huge_pages() returns -ENOMEM if nodemask can't be allocated, > but hugetlb_hstate_alloc_pages() doesn't. That is somewhat intentional. The calling context of the two routines is significantly different. hugetlb_hstate_alloc_pages is called at boot time to handle command line parameters. And, hugetlb_hstate_alloc_pages does not return a value as it is of type void. We 'could' print out a warning here. But, if we can't allocate a node mask I am pretty sure we will not be able to boot. I will add a comment. > - there's still __GFP_NORETRY in nodemask allocations > - (cosmetics) Mel pointed out that NODEMASK_FREE() works fine with NULL > pointers -- Mike Kravetz