On Fri, Feb 23, 2024 at 10:56:48AM +0800, Baolin Wang wrote: > I previously considered passing the MR_reason argument to the > htlb_modify_alloc_mask(), which is only used by hugetlb migration. > But in alloc_hugetlb_folio_nodemask(), if there are available hugetlb on > other nodes, we should allow migrating, that will not break the per-node > hugetlb pool. > > That's why I just change the gfp_mask for allocating a new hguetlb when > migration, that can break the pool. Code-wise I think this is good, but I'm having some feelings about where filter out the mask. Ok, I'm trying to get my head around this. It's been a while since I looked into hugetlb code, so here we go. You mentioned that the only reason not to fiddle with gfp_mask before calling in alloc_hugetlb_folio_nodemask(), was that we might be able to find a hugetlb page in another node, and that that's ok because since all nodes remain with the same number of hugetlb pages, per-node pool doesn't get broken. Now, I see that dequeue_hugetlb_folio_nodemask() first tries to get the zonelist of the preferred node, and AFAICS, if it has !GFP_THISNODE, it should also get the zonelists of all other nodes, so we might fallback. In the hope of finding a way to be able to filter out in htlb_modify_alloc_mask(), I was trying to see whether we could skip GFP_THISNODE in dequeue_hugetlb_folio_nodemask() but no because we might end up dequeueing a hugetlb which sits in another node, while we really specified __GFP_THISNODE. The only way might be to somehow decouple dequeue_hugetlb_folio_nodemask() from alloc_hugetlb_folio_nodemask() and do some kind of gfp modification between the two calls. Another thing I dislike is the "-1" in alloc_hugetlb_folio_vma(). I think at least it deserves a comment like "Passing -1 will make us stick to GFP_THISNODE". Although that is another thing, we will pass "-1" which forces GFP_THISNODE when allocating a newly fresh hugetlb page, but in dequeue_hugetlb_folio_nodemask() we might get a page from a different node. That doesn't break per-node pool, but it is somehow odd? -- Oscar Salvador SUSE Labs