On Thu, 3 Oct 2019, Vlastimil Babka wrote: > I think the key differences between Mike's tests and Michal's is this part > from Mike's mail linked above: > > "I 'tested' by simply creating some background activity and then seeing > how many hugetlb pages could be allocated. Of course, many tries over > time in a loop." > > - "some background activity" might be different than Michal's pre-filling > of the memory with (clean) page cache > - "many tries over time in a loop" could mean that kswapd has time to > reclaim and eventually the new condition for pageblock order will pass > every few retries, because there's enough memory for compaction and it > won't return COMPACT_SKIPPED > I'll rely on Mike, the hugetlb maintainer, to assess the trade-off between the potential for encountering very expensive reclaim as Andrea did and the possibility of being able to allocate additional hugetlb pages at runtime if we did that expensive reclaim. For parity with previous kernels it seems reasonable to ask that this remains unchanged since allocating large amounts of hugetlb pages has different latency expectations than during page fault. This patch is available if he'd prefer to go that route. On the other hand, userspace could achieve similar results if it were to use vm.drop_caches and explicitly triggered compaction through either procfs or sysfs before writing to vm.nr_hugepages, and that would be much faster because it would be done in one go. Users who allocate through the kernel command line would obviously be unaffected. Commit b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed") was written with the latter in mind. Mike subsequently requested that hugetlb not be impacted at least provisionally until it could be further assessed. I'd suggest that latter: let the user initiate expensive reclaim and/or compaction when tuning vm.nr_hugepages and leave no surprises for users using hugetlb overcommit, but I wouldn't argue against either approach, he knows the users and expectations of hugetlb far better than I do.