Re: A plan for supporting PageMovable in 2025

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 17.03.25 18:00, Matthew Wilcox wrote:
With the upcoming shrink of struct page to 4 words, we need a plan for
handling PageMovable.  Ideally this does not involve memory allocation,
and is a relatively simple change from what we have now.  To shrink
struct page beyond 4 words, we'll need a better plan, but I think this
will do for the next few months.

Right, I've been focusing on grasping what we need in the long run with frozen pages that don't even want any memdesc (PageOffline).

The current proposed layout for struct page is:

struct page {
     unsigned long flags;
     union {
         struct list_head buddy_list;
         struct list_head pcp_list;
         struct {
             unsigned long memdesc;
             union {
                 unsigned long private;
                 atomic_t _mapcount;
             };
         };
     };
     int _refcount;
};

My proposal for movable non-folio pages is:

  * memdesc is used to point to struct movable_operations (these will
    need to be aligned to 16 bytes, but I think that's fine)

Note that we don't want to allocate a memdesc for PageOffline pages in the long run. For balloon compaction it might be fine as a first step.

How'd we handle PAGE_MAPPING_MOVABLE? See below on my idea to avoid what you describe here.

  * private is used to point to the next page in the list
  * These pages are refcounted
  * We retain a "lock" bit in page->flags

Note that there is also PG_isolated, which I am hoping we can get rid of.


My current bigger idea is something like this:

1) memdesc type (currently folio type) identifies "struct movable_operations". We could think of a registration model for migration handlers.

Pg_offline -> call into balloon compaction

Calling the ->isolate callback will fail if the callback is not responsible for migrating the page, or if somebody else already isolated it.

Ideally, we'd have two bits (per memdesc) to essentially indicate "this is movable" and "this is isolated".

Not 100% sure if the latter is required. If already isolated, simply calling the ->isolate callback will fail. I think most of the existing PG_isolated users are irrelevant, but it's all complicated.

So a single per-memdesc bit + memdesc type might be sufficient to lookup the


2) No dependency on the refcount: ->isolate / ->putback effectively move the ownership ("reference") from the real owner to migration code (so they can be frozen). We just have to make sure that, while a page is isolated, that it cannot be freed by the real owner. (which is already the case IIRC)


3) No lists: we simply use an array of PFNs in migration code?


4) Lock bit: not 100% sure yet, but likely not required if ->isolate / ->migrate / ->putback just handle this locking internally.



Lists are a problem for ballooning drivers with PageOffline pages. I had the exact same thought as you regarding "private is used to point to the next page in the list", but discarded it because it's inefficient for ballooning purposes and not future proof.

So instead, my plan is to using an xarray in the ballooning drivers to store the PFNs of inflated pages.

The only nasty thing is that "insert page in the balloon" can fail if OOM (inserting into the xarray). In general, that's just fine, except in some XEN / Hyper-V code where PageOffline pages are not allocated from the buddy where we could put them back, but they "come to life" with memory that gets added.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux