Re: [PATCH V3] mm/gup: Clear the LRU flag of a page before adding to LRU batch

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Wed, 3 Jul 2024 13:08:43 -0700

On Wed,  3 Jul 2024 20:02:33 +0800 yangge1116@xxxxxxx wrote:

> From: yangge <yangge1116@xxxxxxx>
> 
> If a large number of CMA memory are configured in system (for example, the
> CMA memory accounts for 50% of the system memory), starting a virtual
> virtual machine with device passthrough, it will
> call pin_user_pages_remote(..., FOLL_LONGTERM, ...) to pin memory.
> Normally if a page is present and in CMA area, pin_user_pages_remote()
> will migrate the page from CMA area to non-CMA area because of
> FOLL_LONGTERM flag. But the current code will cause the migration failure
> due to unexpected page refcounts, and eventually cause the virtual machine
> fail to start.
> 
> If a page is added in LRU batch, its refcount increases one, remove the
> page from LRU batch decreases one. Page migration requires the page is not
> referenced by others except page mapping. Before migrating a page, we
> should try to drain the page from LRU batch in case the page is in it,
> however, folio_test_lru() is not sufficient to tell whether the page is
> in LRU batch or not, if the page is in LRU batch, the migration will fail.
> 
> To solve the problem above, we modify the logic of adding to LRU batch.
> Before adding a page to LRU batch, we clear the LRU flag of the page so
> that we can check whether the page is in LRU batch by folio_test_lru(page).
> Seems making the LRU flag of the page invisible a long time is no problem,
> because a new page is allocated from buddy and added to the lru batch,
> its LRU flag is also not visible for a long time.
> 

Thanks.

I'll add this to the mm-hotfixes branch for additional testing.  Please
continue to work with David on the changelog enhancements.

In mm-hotfixes I'd expect to send it to Linus next week.  I could move
it into mm-unstable (then mm-stable) for merging into 6.11-rc1.  This
is for additional testing time - it will still be backported into
earlier kernels.  We can do this with any patch.