On 7 Jan 2025, at 11:49, Matthew Wilcox wrote: > On Tue, Jan 07, 2025 at 05:11:02PM +0100, David Hildenbrand wrote: >> one item on my todo list is making PageOffline pages to stop using "struct >> page" members except page->type and 1/2 flags, to prepare them for the >> memdesc future, to avoid unnecessary atomics, and to resolve some (so-far) >> theoretical issues with temporary speculative references. > > Well, thank goodness someone's working on this! Because I'm stumped. > >> For that, we use the "non-lru page migration" framework and in that process >> we make use of ... way to many members of "struct page"/"struct folio" and >> rely on the refcount not being 0. For example, we certainly don't want to >> allocate memdescs for PageOffline pages just so some of them can be >> migrated. > > I mean, let's start with how we migrate pages. > > int migrate_pages(struct list_head *from, new_folio_t get_new_folio, > free_folio_t put_new_folio, unsigned long private, > enum migrate_mode mode, int reason, unsigned int *ret_succeeded) > ... > list_for_each_entry_safe(folio, folio2, from, lru) { > > We identify every folio to be migrated and put them on a list. But once > non-folio things need to be migrated, this code is wrong. > > We could rename this to migrate_folios() and have a different function > for migrating non-folio memory. But now the compaction code starts to > look distressingly complex [1]. So we need a way to pass in a list/array > of memory to be migrated that doesn't involve a list_head and magically > trying to deduce what the memory is. How about something like folio_batch carrying a list of pointers to the to-be-migrated folios/non-folios? But it consumes memory if the number of to-be-migrated is large and that is probably why ->lru is used. Allocating memory during migration might not be desirable. > > I'm actually wondering about a bitmap. Generally when we migrate memory > it's to create physical contiguity so perhaps passing in a base_pfn > and a bitmap that contains, say, PMD_ORDER bits; then it's the job of > the migration code to figure out what to do for each pfn indicated by > base_pfn and the set bits in the bitmap? > > Although now I write this down, I guess NUMA migration doesn't behave > that way. So perhaps compaction-migration and numa-migration end up > using different interfaces? I think NUMA migration always migrates But both use the same backend to unmap old pages, move metadata, and remap new pages for folios. It is actually non-folios which have a different routine for migration. We probably want a dedicated interface for non-folios when ->lru cannot be used, so during compaction, when a non-folio is encountered, the dedicated non-folio migration interface is called. As I am writing, how often we see non-folios in the entire physical space? If not often, is it possible to just migrate one non-folio at a time so that the list problem just goes away? > folios, so it can keep using get_new_folio() and put_new_folio() while > the compaction-migration might need a different pair of callbacks to > allocate/free memory of many different memdesc types. > > [1] OK, it is already distressingly complex. But we're making it even > more complex. Best Regards, Yan, Zi