Re: [PATCH] mm/hugetlb: wait for hugepage folios to be freed

Ge Yang <yangge1116@xxxxxxx> · Tue, 18 Feb 2025 17:54:23 +0800

在 2025/2/18 17:41, David Hildenbrand 写道:
On 18.02.25 10:22, Ge Yang wrote:

在 2025/2/18 16:55, David Hildenbrand 写道:
On 15.02.25 06:50, Ge Yang wrote:

在 2025/2/14 16:08, David Hildenbrand 写道:
On 14.02.25 07:32, yangge1116@xxxxxxx wrote:
From: Ge Yang <yangge1116@xxxxxxx>

Since the introduction of commit b65d4adbc0f0 ("mm: hugetlb: defer
freeing
of HugeTLB pages"), which supports deferring the freeing of HugeTLB
pages,
the allocation of contiguous memory through cma_alloc() may fail
probabilistically.

In the CMA allocation process, if it is found that the CMA area is
occupied
by in-use hugepage folios, these in-use hugepage folios need to be
migrated
to another location. When there are no available hugepage folios 
in the
free HugeTLB pool during the migration of in-use HugeTLB pages, new
folios
are allocated from the buddy system. A temporary state is set on the
newly
allocated folio. Upon completion of the hugepage folio migration, the
temporary state is transferred from the new folios to the old folios.
Normally, when the old folios with the temporary state are freed, 
it is
directly released back to the buddy system. However, due to the
deferred
freeing of HugeTLB pages, the PageBuddy() check fails, ultimately
leading
to the failure of cma_alloc().

Here is a simplified call trace illustrating the process:
cma_alloc()
       ->__alloc_contig_migrate_range() // Migrate in-use hugepage
           ->unmap_and_move_huge_page()
               ->folio_putback_hugetlb() // Free old folios
       ->test_pages_isolated()
           ->__test_page_isolated_in_pageblock()
                ->PageBuddy(page) // Check if the page is in buddy

To resolve this issue, we have implemented a function named
wait_for_hugepage_folios_freed(). This function ensures that the
hugepage
folios are properly released back to the buddy system after their
migration
is completed. By invoking wait_for_hugepage_folios_freed() following
the
migration process, we guarantee that when test_pages_isolated() is
executed, it will successfully pass.

Okay, so after every successful migration -> put of src, we wait 
for the
src to actually get freed.

When migrating multiple hugetlb folios, we'd wait once per folio.

It reminds me a bit about pcp caches, where folios are !buddy until 
the
pcp was drained.

It seems that we only track unmovable, reclaimable, and movable 
pages on
the pcp lists. For specific details, please refer to the
free_frozen_pages() function.

It reminded me about PCP caches, because we effectively also have to
wait for some stuck folios to properly get freed to the buddy.

It seems that when an isolated page is freed, it won't be placed back
into the PCP caches.

I recall there are cases when the page was in the pcp before the 
isolation started, which is why we drain the pcp at some point (IIRC).

Yes, indeed, drain_all_pages(cc.zone) is currently executed before 
__alloc_contig_migrate_range().