Re: PageOffline: refcount, flags and memdesc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 14.11.24 16:23, Matthew Wilcox wrote:
On Thu, Nov 14, 2024 at 12:18:15PM +0100, David Hildenbrand wrote:
I'm currently staring again at PageOffline and wonder how we could prepare
it for the memdesc future, and if we can remove refcount handling.


Hi!

Thanks for bringing it up.  As a memdesc, I currently have PageOffline
as being type 0 (Misc), subtype 5 (Offline).  That's bits 0-10 and then
bit 11 is for "may be mapped to userspace".  Bits 12-17 are the order.
With the top bits being used for section/node/zone, that could be 25 +
12 + 3 = 40 bits, so we'd have 7 bits remaining for use as flags.

"may be mapped to userspace" will always be 0. 1 / 2 flags initially would do.


I'd like to stop using the refcount for PageOffline pages, and keep the
refcount always at 0.

I think this makes sense.

But the refcount, it is currently used to detect whether we are allowed to
offline memory blocks that contain PageOffline pages, because only selected
drivers support re-onlining. Well, and it is used when returning the pages
to the buddy where free_page()/free_contig_range().... expect a refcount of
1.

Further, virtio-mem currently uses the PageDirty() bit to remember if a
PageOffline page was already exposed to the buddy before, or if we must use
generic_online_page().

For now we would need the following information, that could be stored in 2
flags, leaving the refcount at 0:

(1) Was it obtained from the buddy or never exposed it to the buddy

PageOffline() && PageOfflineNeverOnlined()

(2) The driver does support actual memory offlining+reonlining, they can
     be skipped when offlining.

PageOffline() && PageOfflineSkippable


But when allocating/freeing pages we would still mess with the refcount,
which is bad.

We could have a dedicated interface for freeing them, where we abstract the
generic_online_page() bits, and leave the refcount at 0:

free_offline_page()
free_offline_page_range()

And

alloc_offline_page()
alloc_offline_page_range()
alloc_offline_pages

I'm not super happy about the "alloc/free" terminology, but nothing better
came to mind.

If I resurrect
https://lore.kernel.org/linux-mm/20220809171854.3725722-1-willy@xxxxxxxxxxxxx/
would the frozen terminology work for you here?

Ah, I remember that.

I was more concerned about alloc/free terminology, because "free_offline_page" could simply be "online_page" :) But the "allocation" part is trickier. Maybe it's simply alloc/free of frozen pages for the time being.

But yes, the "allocate/free pages without involving refcounts" will be a crucial thing to get the PageOffline conversion flying.

Instead of alloc_frozen_pages(), I was wondering if we should have something like GFP_FROZEN. For example, for two PG_offline users I'd currently also need alloc_contig_frozen_range() and alloc_contig_frozen_pages(). Using alloc_contig_range(GFP_FROZEN) alloc_contig_pages(GFP_PROZEN) would make that easier.

Did you consider that already?


There is one complication to sort out: balloon_compaction.h supports moving
PageOffline pages, and seems to use the page lock, page refcount, page lru,
page private... which is all rather nasty. I wonder if these should get
their own page type, like PageMovableOffline, and we'd mostly leave them
alone for now. This would mean that virtio-balloon, vmware-balloon and ppc
CMM would keep doing the old refcount-based thing but with a new page type.

It's fairly clear to me now that we have a sane story for moving
file/anon folios.  The current way we handle movable pages looks mostly
insane because it's hammered into that framework,  I think we need
something entirely different to handle movable non-folio pages, but I
don't know what that story is yet.

Okay, so the first step would be to leave that part alone and convert the other (sane :) ) users of PageOffline to not refcount.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux