On Wed, Mar 06, 2024 at 06:13:27PM +0800, Baolin Wang wrote: > As discussed in previous thread [1], there is an inconsistency when handing > hugetlb migration. When handling the migration of freed hugetlb, it prevents > fallback to other NUMA nodes in alloc_and_dissolve_hugetlb_folio(). However, > when dealing with in-use hugetlb, it allows fallback to other NUMA nodes in > alloc_hugetlb_folio_nodemask(), which can break the per-node hugetlb pool > and might result in unexpected failures when node bound workloads doesn't get > what is asssumed available. > > To make hugetlb migration strategy more clear, we should list all the scenarios > of hugetlb migration and analyze whether allocation fallback is permitted: > 1) Memory offline: will call dissolve_free_huge_pages() to free the freed hugetlb, > and call do_migrate_range() to migrate the in-use hugetlb. Both can break the > per-node hugetlb pool, but as this is an explicit offlining operation, no better > choice. So should allow the hugetlb allocation fallback. > 2) Memory failure: same as memory offline. Should allow fallback to a different node > might be the only option to handle it, otherwise the impact of poisoned memory can > be amplified. > 3) Longterm pinning: will call migrate_longterm_unpinnable_pages() to migrate in-use > and not-longterm-pinnable hugetlb, which can break the per-node pool. But we should > fail to longterm pinning if can not allocate on current node to avoid breaking the > per-node pool. > 4) Syscalls (mbind, migrate_pages, move_pages): these are explicit users operation > to move pages to other nodes, so fallback to other nodes should not be prohibited. > 5) alloc_contig_range: used by CMA allocation and virtio-mem fake-offline to allocate > given range of pages. Now the freed hugetlb migration is not allowed to fallback, to > keep consistency, the in-use hugetlb migration should be also not allowed to fallback. > 6) alloc_contig_pages: used by kfence, pgtable_debug etc. The strategy should be > consistent with that of alloc_contig_range(). > > Based on the analysis of the various scenarios above, introducing a new helper to > determine whether fallback is permitted according to the migration reason.. > > [1] https://lore.kernel.org/all/6f26ce22d2fcd523418a085f2c588fe0776d46e7.1706794035.git.baolin.wang@xxxxxxxxxxxxxxxxx/ > Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> Reviewed-by: Oscar Salvador <osalvador@xxxxxxx> > +static inline bool htlb_allow_alloc_fallback(int reason) > +{ > + bool allowed_fallback = false; > + > + /* > + * Note: the memory offline, memory failure and migration syscalls will > + * be allowed to fallback to other nodes due to lack of a better chioce, ^ choice -- Oscar Salvador SUSE Labs