On Tue 06-02-24 16:18:22, Baolin Wang wrote: > > > On 2024/2/5 22:23, Michal Hocko wrote: > > On Mon 05-02-24 21:06:17, Baolin Wang wrote: > > [...] > > > > It is quite possible that traditional users (like large DBs) do not use > > > > CMA heavily so such a problem was not observed so far. That doesn't mean > > > > those problems do not really matter. > > > > > > CMA is just one case, as I mentioned before, other situations can also break > > > the per-node hugetlb pool now. > > > > Is there any other case than memory hotplug which is arguably different > > as it is a disruptive operation already. > > Yes, like I said before the longterm pinning, memory failure and the users > of alloc_contig_pages() may also break the per-node hugetlb pool. memory failure is similar to the memory hotplug in the sense that it is a disruptive operation and fallback to a different node might be the only option to handle it. On the other hand longterm pinning is similar to a_c_p and it should fail if it cannot be migrated within the node. It seems that hugetlb is quite behind with many other features and I am not really sure how to deal with that. What is your take Munchun Song? > > > Let's focus on the main point, why we should still keep inconsistency > > > behavior to handle free and in-use hugetlb for alloc_contig_range()? That's > > > really confused. > > > > yes, this should behave consistently. And the least surprising way to > > handle that from the user configuration POV is to not move outside of > > the original NUMA node. > > So you mean we should also add __GFP_THISNODE flag in > alloc_migration_target() when allocating a new hugetlb as the target for > migration, that can unify the behavior and avoid breaking the per-node pool? Not as simple as that, because alloc_migration_target is used also from an user driven migration. -- Michal Hocko SUSE Labs