Re: [LSF/MM/BPF TOPIC] Non-lru page migration in a memdesc world

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Tue, 7 Jan 2025 16:49:45 +0000

On Tue, Jan 07, 2025 at 05:11:02PM +0100, David Hildenbrand wrote:
> one item on my todo list is making PageOffline pages to stop using "struct
> page" members except page->type and 1/2 flags, to prepare them for the
> memdesc future, to avoid unnecessary atomics, and to resolve some (so-far)
> theoretical issues with temporary speculative references.

Well, thank goodness someone's working on this!  Because I'm stumped.

> For that, we use the "non-lru page migration" framework and in that process
> we make use of ... way to many members of "struct page"/"struct folio" and
> rely on the refcount not being 0. For example, we certainly don't want to
> allocate memdescs for PageOffline pages just so some of them can be
> migrated.

I mean, let's start with how we migrate pages.

int migrate_pages(struct list_head *from, new_folio_t get_new_folio,
                free_folio_t put_new_folio, unsigned long private,
                enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
...
        list_for_each_entry_safe(folio, folio2, from, lru) {

We identify every folio to be migrated and put them on a list.  But once
non-folio things need to be migrated, this code is wrong.

We could rename this to migrate_folios() and have a different function
for migrating non-folio memory.  But now the compaction code starts to
look distressingly complex [1].  So we need a way to pass in a list/array
of memory to be migrated that doesn't involve a list_head and magically
trying to deduce what the memory is.

I'm actually wondering about a bitmap.  Generally when we migrate memory
it's to create physical contiguity so perhaps passing in a base_pfn
and a bitmap that contains, say, PMD_ORDER bits; then it's the job of
the migration code to figure out what to do for each pfn indicated by
base_pfn and the set bits in the bitmap?

Although now I write this down, I guess NUMA migration doesn't behave
that way.  So perhaps compaction-migration and numa-migration end up
using different interfaces?  I think NUMA migration always migrates
folios, so it can keep using get_new_folio() and put_new_folio() while
the compaction-migration might need a different pair of callbacks to
allocate/free memory of many different memdesc types.

[1] OK, it is already distressingly complex.  But we're making it even
more complex.