On Tue, Nov 17, 2020 at 04:38:52PM +0100, David Hildenbrand wrote: > Sorry for the late replay, fairly busy with all kinds of things. Heh, no worries, I appreciate the time :-) > > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c > > index b02fd51e5589..6b57bf90ca72 100644 > > --- a/drivers/acpi/acpi_memhotplug.c > > +++ b/drivers/acpi/acpi_memhotplug.c > > @@ -195,7 +195,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device) > > node = memory_add_physaddr_to_nid(info->start_addr); > > result = __add_memory(node, info->start_addr, info->length, > > - MHP_NONE); > > + MEMHP_MEMMAP_ON_MEMORY); > > I'd suggest moving that into a separate patch. Fine by me > > diff --git a/include/linux/memory.h b/include/linux/memory.h > > index 439a89e758d8..7cc93de5856c 100644 > > --- a/include/linux/memory.h > > +++ b/include/linux/memory.h > > @@ -30,6 +30,7 @@ struct memory_block { > > int phys_device; /* to which fru does this belong? */ > > struct device dev; > > int nid; /* NID for this memory block */ > > + unsigned long nr_vmemmap_pages; /* Number for vmemmap pages */ > > Maybe also document that these pages are directly at the beginning of the > memory block. Sure > > -static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages) > > +static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages, > > + unsigned long nr_vmemmap_pages) > > { > > return -EINVAL; > > } > > @@ -369,10 +372,12 @@ extern int add_memory_driver_managed(int nid, u64 start, u64 size, > > mhp_t mhp_flags); > > extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, > > unsigned long nr_pages, > > + unsigned long nr_vmemmap_pages, > > struct vmem_altmap *altmap, int migratetype); > > extern void remove_pfn_range_from_zone(struct zone *zone, > > unsigned long start_pfn, > > - unsigned long nr_pages); > > + unsigned long nr_pages, > > + unsigned long nr_vmemmap_pages); > > I think we should not pass nr_vmemmap_pages down here but instead do two > separate calls to move_pfn_range_to_zone()/remove_pfn_range_from_zone() from > online_pages()/offline_pages() > > 1. for vmemmap pages, migratetype = MIGRATE_UNMOVABLE > 2. for remaining pages, migratetype = MIGRATE_ISOLATE Ok, that was the other option, it might be even cleaner. > > + valid_start_pfn = pfn + nr_vmemmap_pages; > > + valid_nr_pages = nr_pages - nr_vmemmap_pages; > > Hm, valid sounds strange. More like "free_start_pfn" or "buddy_start_pfn". Agreed, I might lean towards buddy_start_pfn. > > - move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE); > > + move_pfn_range_to_zone(zone, pfn, nr_pages, nr_vmemmap_pages, NULL, > > + MIGRATE_ISOLATE); > > As mentioned, I'd suggest properly initializing the memmap here > > if (nr_vmemmap_pages) { > move_pfn_range_to_zone(zone, pfn, nr_vmemmap_pages, NULL, > MIGRATE_UNMOVABLE); > } > move_pfn_range_to_zone(zone, valid_start_pfn, valid_nr_pages, NULL, Sure, agreed. > > + if (!support_memmap_on_memory(size)) > > + mhp_flags &= ~MEMHP_MEMMAP_ON_MEMORY; > > Callers (e.g., virtio-mem) might rely on this. We should reject this with > -EINVAL and provide a way for callers to test whether this flag is possible. Uhm, we might want to make "support_memmap_on_memory" public, and callers who might want to it use can check its return value? Or do you have something else in mind? Agreed on the -EINVAIL. > > + if (mhp_flags & MEMHP_MEMMAP_ON_MEMORY) > > + mhp_mark_vmemmap_pages(params.altmap); > > Do we really still need that? Pages are offline, so we're messing with an > invalid memmap. online_pages() should handle initializing the memmap of > these pages. Yeah, on a second thought we do not need this. Since the pages are still offline, no one should be messing with that range yet anyway. > > [...] > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index e74ca22baaa1..043503fb8c6e 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -8761,6 +8761,13 @@ void __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) > > spin_lock_irqsave(&zone->lock, flags); > > while (pfn < end_pfn) { > > page = pfn_to_page(pfn); > > + /* > > + * Skip vmemmap pages > > + */ > > + if (PageVmemmap(page)) { > > + pfn += vmemmap_nr_pages(page); > > + continue; > > + } > > I'd assume calling code can handle that and exclude isolating such pages. The thing is that __offline_isolated_pages calls offline_mem_sections(), so we really need the first pfn, and not the "pfn + nr_vmemmap_pages". Instead of skipping it in the loop, I might just skip it before entering the loop. Thanks! -- Oscar Salvador SUSE L3