Hi,
I'm currently staring again at PageOffline and wonder how we could
prepare it for the memdesc future, and if we can remove refcount handling.
Currently, we set PageOffline in the following cases (one nast exception
below):
(a) Memory blocks gets onlined, whereby we initialize all "struct pages
to PageOffline + refcount of 1: memmap_init_range(). These pages are
expected to get onlined via generic_online_page() later. Drivers
might decide to leave some offline, because they are not backed by
actual memory in the hypervisor. Some drivers still use free_page()
instead of generic_online_page().
(b) We allocated pages (alloc_page(), alloc_contig_pages() ...) to
logically offline them, whereby the refcount is set to 1 by the
buddy and to PageOffline is set manually be the driver afterwards.
We clear PageOffline in the following cases (one nasty exception below):
(a) We want to return a page to the buddy (free_page/
free_contig_page_range).
PageOffline is cleared by the driver and freeing the page will
decrement the refcount to 0.
(b) We want to expose it to the buddy the first time
(generic_online_page). We will force the refcount to 0.
There are still subtle differences between onlining a page the first
time to the buddy, such as debug_pagealloc_map_pages() in
__free_pages_core(). I'm hoping we can get rid of them long-term, or
just abstract it internally.
I'd like to stop using the refcount for PageOffline pages, and keep the
refcount always at 0.
But the refcount, it is currently used to detect whether we are allowed
to offline memory blocks that contain PageOffline pages, because only
selected drivers support re-onlining. Well, and it is used when
returning the pages to the buddy where
free_page()/free_contig_range().... expect a refcount of 1.
Further, virtio-mem currently uses the PageDirty() bit to remember if a
PageOffline page was already exposed to the buddy before, or if we must
use generic_online_page().
For now we would need the following information, that could be stored in
2 flags, leaving the refcount at 0:
(1) Was it obtained from the buddy or never exposed it to the buddy
PageOffline() && PageOfflineNeverOnlined()
(2) The driver does support actual memory offlining+reonlining, they can
be skipped when offlining.
PageOffline() && PageOfflineSkippable
But when allocating/freeing pages we would still mess with the refcount,
which is bad.
We could have a dedicated interface for freeing them, where we abstract
the generic_online_page() bits, and leave the refcount at 0:
free_offline_page()
free_offline_page_range()
And
alloc_offline_page()
alloc_offline_page_range()
alloc_offline_pages
I'm not super happy about the "alloc/free" terminology, but nothing
better came to mind.
There is one complication to sort out: balloon_compaction.h supports
moving PageOffline pages, and seems to use the page lock, page refcount,
page lru, page private... which is all rather nasty. I wonder if these
should get their own page type, like PageMovableOffline, and we'd mostly
leave them alone for now. This would mean that virtio-balloon,
vmware-balloon and ppc CMM would keep doing the old refcount-based thing
but with a new page type.
I assume this all goes into the direction of getting pages from the
buddy and returning them without refcounts ... thoughts?
--
Cheers,
David / dhildenb