On Tue, Apr 06, 2021 at 01:11:11PM +0200, Oscar Salvador wrote: > Physical memory hotadd has to allocate a memmap (struct page array) for > the newly added memory section. Currently, alloc_pages_node() is used > for those allocations. > > This has some disadvantages: > a) an existing memory is consumed for that purpose > (eg: ~2MB per 128MB memory section on x86_64) > b) if the whole node is movable then we have off-node struct pages > which has performance drawbacks. > c) It might be there are no PMD_ALIGNED chunks so memmap array gets > populated with base pages. > > This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled. > > Vmemap page tables can map arbitrary memory. > That means that we can simply use the beginning of each memory section and > map struct pages there. > struct pages which back the allocated space then just need to be treated > carefully. > > Implementation wise we will reuse vmem_altmap infrastructure to override > the default allocator used by __populate_section_memmap. > Part of the implementation also relies on memory_block structure gaining > a new field which specifies the number of vmemmap_pages at the beginning. > This patch also introduces the following functions: > > - vmemmap_init_space: Initializes vmemmap pages by calling move_pfn_range_to_zone(), > calls kasan_add_zero_shadow() or the vmemmap range and marks > online as many sections as vmemmap pages fully span. > - vmemmap_adjust_pages: Accounts/substract vmemmap_pages to node and zone > present_pages > - vmemmap_deinit_space: Undoes what vmemmap_init_space does. > > The new function memory_block_online() calls vmemmap_init_space() before > doing the actual online_pages(). Should online_pages() fail, we clean up > by calling vmemmap_adjust_pages() and vmemmap_deinit_space(). > > On offline, memory_block_offline() calls vmemmap_adjust_pages() prior to calling > offline_pages(), because offline_pages() performs the tearing-down of kthreads > and the rebuilding of the zonelists if the node/zone become empty. > If offline_pages() fails, we account back vmemmap pages by vmemmap_adjust_pages(). > If it succeeds, we call vmemmap_deinit_space(). > > Hot-remove: > > We need to be careful when removing memory, as adding and > removing memory needs to be done with the same granularity. > To check that this assumption is not violated, we check the > memory range we want to remove and if a) any memory block has > vmemmap pages and b) the range spans more than a single memory > block, we scream out loud and refuse to proceed. > > If all is good and the range was using memmap on memory (aka vmemmap pages), > we construct an altmap structure so free_hugepage_table does the right > thing and calls vmem_altmap_free instead of free_pagetable. > > Signed-off-by: Oscar Salvador <osalvador@xxxxxxx> Heh, it seems I spaced out today. We need a few things on top: - In case !CONFIG_MEMORY_HOTREMOVE, we still need vmemmap_deinit_space as we call it from memory_block_online() too in case online_pages() fails. So we need to move it out of #CONFIG_MEMORY_HOTREMOVE, with the others. - If vmemmap pages fully spans sections, we need to mark those sections as online, since online_pages() will not do it for us. In vmemmap_deinit_space, we need to mark them back offline. Since vmemmap_deinit_space might get called from memory_block_online(), we need to move the offline_mem_sections() out of #CONFIG_MEMORY_HOTREMOVE code. So, we would need the following on top: diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index cc3dbc0abf02..c7669d2accfd 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -111,6 +111,7 @@ extern int add_one_highpage(struct page *page, int pfn, int bad_ppro); extern void vmemmap_adjust_pages(unsigned long pfn, long nr_pages); extern int vmemmap_init_space(unsigned long pfn, unsigned long nr_pages, int nid, int online_type); +extern void vmemmap_deinit_space(unsigned long pfn, unsigned long nr_pages); extern int online_pages(unsigned long pfn, unsigned long nr_pages, int online_type, int nid); extern struct zone *test_pages_in_a_zone(unsigned long start_pfn, @@ -317,7 +318,6 @@ static inline void pgdat_resize_init(struct pglist_data *pgdat) {} #ifdef CONFIG_MEMORY_HOTREMOVE -extern void vmemmap_deinit_space(unsigned long pfn, unsigned long nr_pages); extern void try_offline_node(int nid); extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); extern int remove_memory(int nid, u64 start, u64 size); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 747e1c21d8e2..76f4ca5ed230 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1383,10 +1383,8 @@ static inline int online_section_nr(unsigned long nr) #ifdef CONFIG_MEMORY_HOTPLUG void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn); -#ifdef CONFIG_MEMORY_HOTREMOVE void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn); #endif -#endif static inline struct mem_section *__pfn_to_section(unsigned long pfn) { diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 43f0daf922e6..68f11751cd82 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -885,6 +885,25 @@ int vmemmap_init_space(unsigned long pfn, unsigned long nr_pages, int nid, return ret; } +void vmemmap_deinit_space(unsigned long pfn, unsigned long nr_pages) +{ + unsigned long end_pfn = pfn + nr_pages; + /* + * The pages associated with this vmemmap have been offlined, so + * we can reset its state here in case we have page_init_poison. + */ + remove_pfn_range_from_zone(page_zone(pfn_to_page(pfn)), pfn, nr_pages); + kasan_remove_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages)); + + /* + * It might be that the vmemmap_pages fully span sections. If that is + * the case, mark those sections offline here as otherwise they will be + * left online. + */ + if (nr_pages >= PAGES_PER_SECTION) + offline_mem_sections(pfn, ALIGN_DOWN(end_pfn, PAGES_PER_SECTION)); +} + int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_type, int nid) { @@ -1672,16 +1691,6 @@ static int count_system_ram_pages_cb(unsigned long start_pfn, return 0; } -void vmemmap_deinit_space(unsigned long pfn, unsigned long nr_pages) -{ - /* - * The pages associated with this vmemmap have been offlined, so - * we can reset its state here in case we have page_init_poison. - */ - remove_pfn_range_from_zone(page_zone(pfn_to_page(pfn)), pfn, nr_pages); - kasan_remove_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages)); -} - int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) { const unsigned long end_pfn = start_pfn + nr_pages; diff --git a/mm/sparse.c b/mm/sparse.c index 7bd23f9d6cef..8e96cf00536b 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -623,7 +623,6 @@ void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn) } } -#ifdef CONFIG_MEMORY_HOTREMOVE /* Mark all memory sections within the pfn range as offline */ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn) { @@ -644,7 +643,6 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn) ms->section_mem_map &= ~SECTION_IS_ONLINE; } } -#endif #ifdef CONFIG_SPARSEMEM_VMEMMAP static struct page * __meminit populate_section_memmap(unsigned long pfn, -- Oscar Salvador SUSE L3