On 2/22/21 5:51 AM, Oscar Salvador wrote: > alloc_contig_range will fail if it ever sees a HugeTLB page within the > range we are trying to allocate, even when that page is free and can be > easily reallocated. > This has proved to be problematic for some users of alloc_contic_range, > e.g: CMA and virtio-mem, where those would fail the call even when those > pages lay in ZONE_MOVABLE and are free. > > We can do better by trying to replace such page. > > Free hugepages are tricky to handle so as to no userspace application > notices disruption, we need to replace the current free hugepage with > a new one. > > In order to do that, a new function called alloc_and_dissolve_huge_page > is introduced. > This function will first try to get a new fresh hugepage, and if it > succeeds, it will replace the old one in the free hugepage pool. > > All operations are being handled under hugetlb_lock, so no races are > possible. The only exception is when page's refcount is 0, but it still > has not been flagged as PageHugeFreed. > In this case we retry as the window race is quite small and we have high > chances to succeed next time. > > With regard to the allocation, we restrict it to the node the page belongs > to with __GFP_THISNODE, meaning we do not fallback on other node's zones. > > Note that gigantic hugetlb pages are fenced off since there is a cyclic > dependency between them and alloc_contig_range. > > Signed-off-by: Oscar Salvador <osalvador@xxxxxxx> Thanks Oscar, I spent a bunch of time looking for possible race issues. Thankfully, the recent code from Muchun dealing with free lists helps. In addition, all the hugetlb acounting looks good. Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > --- > include/linux/hugetlb.h | 6 +++ > mm/compaction.c | 12 ++++++ > mm/hugetlb.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++- > 3 files changed, 127 insertions(+), 2 deletions(-) -- Mike Kravetz