Re: [LSF/MM/BPF TOPIC] Non-lru page migration in a memdesc world

Zi Yan <ziy@xxxxxxxxxx> · Tue, 07 Jan 2025 22:39:28 -0500

On 7 Jan 2025, at 11:49, Matthew Wilcox wrote:

> On Tue, Jan 07, 2025 at 05:11:02PM +0100, David Hildenbrand wrote:
>> one item on my todo list is making PageOffline pages to stop using "struct
>> page" members except page->type and 1/2 flags, to prepare them for the
>> memdesc future, to avoid unnecessary atomics, and to resolve some (so-far)
>> theoretical issues with temporary speculative references.
>
> Well, thank goodness someone's working on this!  Because I'm stumped.
>
>> For that, we use the "non-lru page migration" framework and in that process
>> we make use of ... way to many members of "struct page"/"struct folio" and
>> rely on the refcount not being 0. For example, we certainly don't want to
>> allocate memdescs for PageOffline pages just so some of them can be
>> migrated.
>
> I mean, let's start with how we migrate pages.
>
> int migrate_pages(struct list_head *from, new_folio_t get_new_folio,
>                 free_folio_t put_new_folio, unsigned long private,
>                 enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
> ...
>         list_for_each_entry_safe(folio, folio2, from, lru) {
>
> We identify every folio to be migrated and put them on a list.  But once
> non-folio things need to be migrated, this code is wrong.
>
> We could rename this to migrate_folios() and have a different function
> for migrating non-folio memory.  But now the compaction code starts to
> look distressingly complex [1].  So we need a way to pass in a list/array
> of memory to be migrated that doesn't involve a list_head and magically
> trying to deduce what the memory is.

How about something like folio_batch carrying a list of pointers to the
to-be-migrated folios/non-folios? But it consumes memory if the number
of to-be-migrated is large and that is probably why ->lru is used.
Allocating memory during migration might not be desirable.

>
> I'm actually wondering about a bitmap.  Generally when we migrate memory
> it's to create physical contiguity so perhaps passing in a base_pfn
> and a bitmap that contains, say, PMD_ORDER bits; then it's the job of
> the migration code to figure out what to do for each pfn indicated by
> base_pfn and the set bits in the bitmap?
>
> Although now I write this down, I guess NUMA migration doesn't behave
> that way.  So perhaps compaction-migration and numa-migration end up
> using different interfaces?  I think NUMA migration always migrates

But both use the same backend to unmap old pages, move metadata, and
remap new pages for folios. It is actually non-folios which have a different
routine for migration. We probably want a dedicated interface for non-folios
when ->lru cannot be used, so during compaction, when a non-folio is
encountered, the dedicated non-folio migration interface is called.
As I am writing, how often we see non-folios in the entire physical space?
If not often, is it possible to just migrate one non-folio at a time
so that the list problem just goes away?

> folios, so it can keep using get_new_folio() and put_new_folio() while
> the compaction-migration might need a different pair of callbacks to
> allocate/free memory of many different memdesc types.
>
> [1] OK, it is already distressingly complex.  But we're making it even
> more complex.

Best Regards,
Yan, Zi