On 17.03.25 18:00, Matthew Wilcox wrote:
With the upcoming shrink of struct page to 4 words, we need a plan for
handling PageMovable. Ideally this does not involve memory allocation,
and is a relatively simple change from what we have now. To shrink
struct page beyond 4 words, we'll need a better plan, but I think this
will do for the next few months.
Right, I've been focusing on grasping what we need in the long run with
frozen pages that don't even want any memdesc (PageOffline).
The current proposed layout for struct page is:
struct page {
unsigned long flags;
union {
struct list_head buddy_list;
struct list_head pcp_list;
struct {
unsigned long memdesc;
union {
unsigned long private;
atomic_t _mapcount;
};
};
};
int _refcount;
};
My proposal for movable non-folio pages is:
* memdesc is used to point to struct movable_operations (these will
need to be aligned to 16 bytes, but I think that's fine)
Note that we don't want to allocate a memdesc for PageOffline pages in
the long run. For balloon compaction it might be fine as a first step.
How'd we handle PAGE_MAPPING_MOVABLE? See below on my idea to avoid what
you describe here.
* private is used to point to the next page in the list
* These pages are refcounted
* We retain a "lock" bit in page->flags
Note that there is also PG_isolated, which I am hoping we can get rid of.
My current bigger idea is something like this:
1) memdesc type (currently folio type) identifies "struct
movable_operations". We could think of a registration model for
migration handlers.
Pg_offline -> call into balloon compaction
Calling the ->isolate callback will fail if the callback is not
responsible for migrating the page, or if somebody else already isolated it.
Ideally, we'd have two bits (per memdesc) to essentially indicate "this
is movable" and "this is isolated".
Not 100% sure if the latter is required. If already isolated, simply
calling the ->isolate callback will fail. I think most of the existing
PG_isolated users are irrelevant, but it's all complicated.
So a single per-memdesc bit + memdesc type might be sufficient to lookup
the
2) No dependency on the refcount: ->isolate / ->putback effectively move
the ownership ("reference") from the real owner to migration code (so
they can be frozen). We just have to make sure that, while a page is
isolated, that it cannot be freed by the real owner. (which is already
the case IIRC)
3) No lists: we simply use an array of PFNs in migration code?
4) Lock bit: not 100% sure yet, but likely not required if ->isolate /
->migrate / ->putback just handle this locking internally.
Lists are a problem for ballooning drivers with PageOffline pages. I had
the exact same thought as you regarding "private is used to point to the
next page in the list", but discarded it because it's inefficient for
ballooning purposes and not future proof.
So instead, my plan is to using an xarray in the ballooning drivers to
store the PFNs of inflated pages.
The only nasty thing is that "insert page in the balloon" can fail if
OOM (inserting into the xarray). In general, that's just fine, except in
some XEN / Hyper-V code where PageOffline pages are not allocated from
the buddy where we could put them back, but they "come to life" with
memory that gets added.
--
Cheers,
David / dhildenb