On 26.03.21 09:35, Michal Hocko wrote:
On Thu 25-03-21 23:06:50, Oscar Salvador wrote:
On Thu, Mar 25, 2021 at 05:47:30PM +0100, Michal Hocko wrote:
On Thu 25-03-21 17:36:22, Michal Hocko wrote:
If all it takes is to make pfn_to_online_page work (and my
previous attempt is incorrect because it should consult block rather
than section pfn range)
This should work.
Sorry, but while this solves some of the issues with that approach, I really
think that overcomplicates things and buys us not so much in return.
To me it seems that we are just patching things to make it work that
way.
I do agree that special casing vmemmap areas is something I do not
really like but we are in that schrödinger situation when this memory is
not onlineable unless it shares memory section with an onlineable
memory. There are two ways around that, either special case it on
pfn_to_online_page or mark the vmemmap section online even though it is
not really.
To be honest, I dislike this, and I guess we can only agree to disagree
here.
No problem there. I will not insist on my approach unless I can convince
you that it is a better solution. It seems I have failed and I can live
with that.
I find the following much easier, cleaner, and less risky to encounter
pitfalls in the future:
(!!!It is untested and incomplete, and I would be surprised if it even
compiles, but it is enough as a PoC !!!)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 5ea2b3fbce02..799d14fc2f9b 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -169,6 +169,60 @@ int memory_notify(unsigned long val, void *v)
return blocking_notifier_call_chain(&memory_chain, val, v);
}
+static int memory_block_online(unsigned long start_pfn, unsigned long nr_pages,
+ unsigned long nr_vmemmap_pages, int online_type,
+ int nid)
+{
+ int ret;
+ /*
+ * Despite vmemmap pages having a different lifecycle than the pages
+ * they describe, initialiating and accounting vmemmap pages at the
+ * online/offline stage eases things a lot.
This requires quite some explaining.
+ * We do it out of {online,offline}_pages, so those routines only have
+ * to deal with pages that are actual usable memory.
+ */
+ if (nr_vmemmap_pages) {
+ struct zone *z;
+
+ z = zone_for_pfn_range(online_type, nid, start_pfn, nr_pages);
+ move_pfn_range_to_zone(z, start_pfn, nr_vmemmap_pages, NULL,
+ MIGRATE_UNMOVABLE);
+ /*
+ * The below could go to a helper to make it less bulky here,
+ * so {online,offline}_pages could also use it.
+ */
+ z->present_pages += nr_pages;
+ pgdat_resize_lock(z->zone_pgdat, &flags);
+ z->zone_pgdat->node_present_pages += nr_pages;
+ pgdat_resize_unlock(z->zone_pgdat, &flags);
Might have to set fully spanned section online. (vmemmap >= SECTION_SIZE)
+ }
+
+ ret = online_pages(start_pfn + nr_vmemmap_pages, nr_pages - nr_vmemmap_pages,
+ online_type);
+
+ /*
+ * In case online_pages() failed for some reason, we should cleanup vmemmap
+ * accounting as well.
+ */
+ return ret;
+}
Yes this is much better! Just a minor suggestion would be to push
memory_block all the way to memory_block_online (it oline a memory
block). I would also slightly prefer to provide 2 helpers that would make
it clear that this is to reserve/cleanup the vmemamp space (defined in
the memory_hotplug proper).
Thanks!
Something else to note:
We'll not call the memory notifier (e.g., MEM_ONLINE) for the vmemmap.
The result is that
1. We won't allocate extended struct pages for the range. Don't think
this is really problematic (pages are never allocated/freed, so I guess
we don't care - like ZONE_DEVICE code).
2. We won't allocate kasan shadow memory. We most probably have to do it
explicitly via kasan_add_zero_shadow()/kasan_remove_zero_shadow(), see
mm/memremap.c:pagemap_range()
Further a locking rework might be necessary. We hold the device hotplug
lock, but not the memory hotplug lock. E.g., for get_online_mems().
Might have to move that out online_pages.
--
Thanks,
David / dhildenb