The patch titled Subject: mm: keep nid around during hot-remove has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-keep-nid-around-during-hot-remove.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-keep-nid-around-during-hot-remove.patch This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> Subject: mm: keep nid around during hot-remove Date: Tue, 6 Aug 2024 22:14:54 +0000 nid is needed during memory hot-remove in order to account the information about the memmap overhead that is being removed. In addition, we cannot use page_pgdat(pfn_to_page(pfn)) during hotremove after remove_pfn_range_from_zone(). We also cannot determine nid from walking through memblocks after remove_memory_block_devices() is called. Therefore, pass nid down from the beginning of hotremove to where it is used for the accounting purposes. Link: https://lkml.kernel.org/r/20240806221454.1971755-2-pasha.tatashin@xxxxxxxxxx Fixes: 15995a352474 ("mm: report per-page metadata information") Signed-off-by: Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> Reported-by: Yi Zhang <yi.zhang@xxxxxxxxxx> Closes: https://lore.kernel.org/linux-cxl/CAHj4cs9Ax1=CoJkgBGP_+sNu6-6=6v=_L-ZBZY0bVLD3wUWZQg@xxxxxxxxxxxxxx Reported-by: Alison Schofield <alison.schofield@xxxxxxxxx> Closes: https://lore.kernel.org/linux-mm/Zq0tPd2h6alFz8XF@aschofie-mobl2/#t Cc: Albert Ou <aou@xxxxxxxxxxxxxxxxx> Cc: Alexander Gordeev <agordeev@xxxxxxxxxxxxx> Cc: Alexandre Ghiti <alexghiti@xxxxxxxxxxxx> Cc: Andy Lutomirski <luto@xxxxxxxxxx> Cc: Ard Biesheuvel <ardb@xxxxxxxxxx> Cc: Arnd Bergmann <arnd@xxxxxxxx> Cc: Baoquan He <bhe@xxxxxxxxxx> Cc: Björn Töpel <bjorn@xxxxxxxxxxxx> Cc: Borislav Petkov <bp@xxxxxxxxx> Cc: Catalin Marinas <catalin.marinas@xxxxxxx> Cc: Chen Jiahao <chenjiahao16@xxxxxxxxxx> Cc: Christian Borntraeger <borntraeger@xxxxxxxxxxxxx> Cc: Christophe Leroy <christophe.leroy@xxxxxxxxxx> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Cc: David Hildenbrand <david@xxxxxxxxxx> Cc: David Rientjes <rientjes@xxxxxxxxxx> Cc: Dawei Li <dawei.li@xxxxxxxxxxxx> Cc: Gerald Schaefer <gerald.schaefer@xxxxxxxxxxxxx> Cc: Heiko Carstens <hca@xxxxxxxxxxxxx> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> Cc: Huacai Chen <chenhuacai@xxxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: Kent Overstreet <kent.overstreet@xxxxxxxxx> Cc: Luis Chamberlain <mcgrof@xxxxxxxxxx> Cc: Mark Rutland <mark.rutland@xxxxxxx> Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx> Cc: Mike Rapoport <rppt@xxxxxxxxxx> Cc: Muchun Song <muchun.song@xxxxxxxxx> Cc: Nam Cao <namcao@xxxxxxxxxxxxx> Cc: Naveen N Rao <naveen@xxxxxxxxxx> Cc: Nicholas Piggin <npiggin@xxxxxxxxx> Cc: Oscar Salvador <osalvador@xxxxxxx> Cc: Palmer Dabbelt <palmer@xxxxxxxxxxx> Cc: Paul Walmsley <paul.walmsley@xxxxxxxxxx> Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> Cc: Philippe Mathieu-Daudé <philmd@xxxxxxxxxx> Cc: Randy Dunlap <rdunlap@xxxxxxxxxxxxx> Cc: Ryan Roberts <ryan.roberts@xxxxxxx> Cc: Sourav Panda <souravpanda@xxxxxxxxxx> Cc: Sven Schnelle <svens@xxxxxxxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Thomas Zimmermann <tzimmermann@xxxxxxx> Cc: Vasily Gorbik <gor@xxxxxxxxxxxxx> Cc: WANG Xuerui <kernel@xxxxxxxxxx> Cc: Will Deacon <will@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- arch/arm64/mm/mmu.c | 5 +++-- arch/loongarch/mm/init.c | 5 +++-- arch/powerpc/mm/mem.c | 5 +++-- arch/riscv/mm/init.c | 5 +++-- arch/s390/mm/init.c | 5 +++-- arch/x86/mm/init_64.c | 5 +++-- include/linux/memory_hotplug.h | 7 ++++--- mm/memory_hotplug.c | 18 +++++++++--------- mm/memremap.c | 6 ++++-- mm/sparse-vmemmap.c | 14 ++++++++------ mm/sparse.c | 20 +++++++++++--------- 11 files changed, 54 insertions(+), 41 deletions(-) --- a/arch/arm64/mm/mmu.c~mm-keep-nid-around-during-hot-remove +++ a/arch/arm64/mm/mmu.c @@ -1363,12 +1363,13 @@ int arch_add_memory(int nid, u64 start, return ret; } -void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap) +void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap, + int nid) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; - __remove_pages(start_pfn, nr_pages, altmap); + __remove_pages(start_pfn, nr_pages, altmap, nid); __remove_pgd_mapping(swapper_pg_dir, __phys_to_virt(start), size); } --- a/arch/loongarch/mm/init.c~mm-keep-nid-around-during-hot-remove +++ a/arch/loongarch/mm/init.c @@ -106,7 +106,8 @@ int arch_add_memory(int nid, u64 start, return ret; } -void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap) +void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap, + int nid) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; @@ -115,7 +116,7 @@ void arch_remove_memory(u64 start, u64 s /* With altmap the first mapped page is offset from @start */ if (altmap) page += vmem_altmap_offset(altmap); - __remove_pages(start_pfn, nr_pages, altmap); + __remove_pages(start_pfn, nr_pages, altmap, nid); } #ifdef CONFIG_NUMA --- a/arch/powerpc/mm/mem.c~mm-keep-nid-around-during-hot-remove +++ a/arch/powerpc/mm/mem.c @@ -157,12 +157,13 @@ int __ref arch_add_memory(int nid, u64 s return rc; } -void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap) +void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap, + int nid) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; - __remove_pages(start_pfn, nr_pages, altmap); + __remove_pages(start_pfn, nr_pages, altmap, nid); arch_remove_linear_mapping(start, size); } #endif --- a/arch/riscv/mm/init.c~mm-keep-nid-around-during-hot-remove +++ a/arch/riscv/mm/init.c @@ -1789,9 +1789,10 @@ int __ref arch_add_memory(int nid, u64 s return ret; } -void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap) +void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap, + int nid) { - __remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, altmap); + __remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, altmap, nid); remove_linear_mapping(start, size); flush_tlb_all(); } --- a/arch/s390/mm/init.c~mm-keep-nid-around-during-hot-remove +++ a/arch/s390/mm/init.c @@ -295,12 +295,13 @@ int arch_add_memory(int nid, u64 start, return rc; } -void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap) +void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap, + int nid) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; - __remove_pages(start_pfn, nr_pages, altmap); + __remove_pages(start_pfn, nr_pages, altmap, nid); vmem_remove_mapping(start, size); } #endif /* CONFIG_MEMORY_HOTPLUG */ --- a/arch/x86/mm/init_64.c~mm-keep-nid-around-during-hot-remove +++ a/arch/x86/mm/init_64.c @@ -1262,12 +1262,13 @@ kernel_physical_mapping_remove(unsigned remove_pagetable(start, end, true, NULL); } -void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap) +void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap, + int nid) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; - __remove_pages(start_pfn, nr_pages, altmap); + __remove_pages(start_pfn, nr_pages, altmap, nid); kernel_physical_mapping_remove(start, start + size); } #endif /* CONFIG_MEMORY_HOTPLUG */ --- a/include/linux/memory_hotplug.h~mm-keep-nid-around-during-hot-remove +++ a/include/linux/memory_hotplug.h @@ -201,9 +201,10 @@ static inline bool movable_node_is_enabl return movable_node_enabled; } -extern void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap); +extern void arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap, + int nid); extern void __remove_pages(unsigned long start_pfn, unsigned long nr_pages, - struct vmem_altmap *altmap); + struct vmem_altmap *altmap, int nid); /* reasonably generic interface to expand the physical pages */ extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, @@ -369,7 +370,7 @@ extern int sparse_add_section(int nid, u unsigned long nr_pages, struct vmem_altmap *altmap, struct dev_pagemap *pgmap); extern void sparse_remove_section(unsigned long pfn, unsigned long nr_pages, - struct vmem_altmap *altmap); + struct vmem_altmap *altmap, int nid); extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum); extern struct zone *zone_for_pfn_range(int online_type, int nid, --- a/mm/memory_hotplug.c~mm-keep-nid-around-during-hot-remove +++ a/mm/memory_hotplug.c @@ -571,7 +571,7 @@ void __ref remove_pfn_range_from_zone(st * calling offline_pages(). */ void __remove_pages(unsigned long pfn, unsigned long nr_pages, - struct vmem_altmap *altmap) + struct vmem_altmap *altmap, int nid) { const unsigned long end_pfn = pfn + nr_pages; unsigned long cur_nr_pages; @@ -586,7 +586,7 @@ void __remove_pages(unsigned long pfn, u /* Select all remaining pages up to the next section boundary */ cur_nr_pages = min(end_pfn - pfn, SECTION_ALIGN_UP(pfn + 1) - pfn); - sparse_remove_section(pfn, cur_nr_pages, altmap); + sparse_remove_section(pfn, cur_nr_pages, altmap, nid); } } @@ -1386,7 +1386,7 @@ bool mhp_supports_memmap_on_memory(void) } EXPORT_SYMBOL_GPL(mhp_supports_memmap_on_memory); -static void __ref remove_memory_blocks_and_altmaps(u64 start, u64 size) +static void __ref remove_memory_blocks_and_altmaps(u64 start, u64 size, int nid) { unsigned long memblock_size = memory_block_size_bytes(); u64 cur_start; @@ -1409,7 +1409,7 @@ static void __ref remove_memory_blocks_a remove_memory_block_devices(cur_start, memblock_size); - arch_remove_memory(cur_start, memblock_size, altmap); + arch_remove_memory(cur_start, memblock_size, altmap, nid); /* Verify that all vmemmap pages have actually been freed. */ WARN(altmap->alloc, "Altmap not fully unmapped"); @@ -1454,7 +1454,7 @@ static int create_altmaps_and_memory_blo ret = create_memory_block_devices(cur_start, memblock_size, params.altmap, group); if (ret) { - arch_remove_memory(cur_start, memblock_size, NULL); + arch_remove_memory(cur_start, memblock_size, NULL, nid); kfree(params.altmap); goto out; } @@ -1463,7 +1463,7 @@ static int create_altmaps_and_memory_blo return 0; out: if (ret && cur_start != start) - remove_memory_blocks_and_altmaps(start, cur_start - start); + remove_memory_blocks_and_altmaps(start, cur_start - start, nid); return ret; } @@ -1532,7 +1532,7 @@ int __ref add_memory_resource(int nid, s /* create memory block devices after memory was added */ ret = create_memory_block_devices(start, size, NULL, group); if (ret) { - arch_remove_memory(start, size, params.altmap); + arch_remove_memory(start, size, params.altmap, nid); goto error; } } @@ -2275,10 +2275,10 @@ static int __ref try_remove_memory(u64 s * No altmaps present, do the removal directly */ remove_memory_block_devices(start, size); - arch_remove_memory(start, size, NULL); + arch_remove_memory(start, size, NULL, nid); } else { /* all memblocks in the range have altmaps */ - remove_memory_blocks_and_altmaps(start, size); + remove_memory_blocks_and_altmaps(start, size, nid); } if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) --- a/mm/memremap.c~mm-keep-nid-around-during-hot-remove +++ a/mm/memremap.c @@ -112,9 +112,11 @@ static void pageunmap_range(struct dev_p { struct range *range = &pgmap->ranges[range_id]; struct page *first_page; + int nid; /* make sure to access a memmap that was actually initialized */ first_page = pfn_to_page(pfn_first(pgmap, range_id)); + nid = page_to_nid(first_page); /* pages are dead and unused, undo the arch mapping */ mem_hotplug_begin(); @@ -122,10 +124,10 @@ static void pageunmap_range(struct dev_p PHYS_PFN(range_len(range))); if (pgmap->type == MEMORY_DEVICE_PRIVATE) { __remove_pages(PHYS_PFN(range->start), - PHYS_PFN(range_len(range)), NULL); + PHYS_PFN(range_len(range)), NULL, nid); } else { arch_remove_memory(range->start, range_len(range), - pgmap_altmap(pgmap)); + pgmap_altmap(pgmap), nid); kasan_remove_zero_shadow(__va(range->start), range_len(range)); } mem_hotplug_done(); --- a/mm/sparse.c~mm-keep-nid-around-during-hot-remove +++ a/mm/sparse.c @@ -638,13 +638,15 @@ static struct page * __meminit populate_ } static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages, - struct vmem_altmap *altmap) + struct vmem_altmap *altmap, int nid) { unsigned long start = (unsigned long) pfn_to_page(pfn); unsigned long end = start + nr_pages * sizeof(struct page); - mod_node_page_state(page_pgdat(pfn_to_page(pfn)), NR_MEMMAP, - -1L * (DIV_ROUND_UP(end - start, PAGE_SIZE))); + if (nid != NUMA_NO_NODE) { + mod_node_page_state(NODE_DATA(nid), NR_MEMMAP, + -1L * (DIV_ROUND_UP(end - start, PAGE_SIZE))); + } vmemmap_free(start, end, altmap); } static void free_map_bootmem(struct page *memmap) @@ -713,7 +715,7 @@ static struct page * __meminit populate_ } static void depopulate_section_memmap(unsigned long pfn, unsigned long nr_pages, - struct vmem_altmap *altmap) + struct vmem_altmap *altmap, int nid) { kvfree(pfn_to_page(pfn)); } @@ -781,7 +783,7 @@ static int fill_subsection_map(unsigned * For 2 and 3, the SPARSEMEM_VMEMMAP={y,n} cases are unified */ static void section_deactivate(unsigned long pfn, unsigned long nr_pages, - struct vmem_altmap *altmap) + struct vmem_altmap *altmap, int nid) { struct mem_section *ms = __pfn_to_section(pfn); bool section_is_early = early_section(ms); @@ -821,7 +823,7 @@ static void section_deactivate(unsigned * section_activate() and pfn_valid() . */ if (!section_is_early) - depopulate_section_memmap(pfn, nr_pages, altmap); + depopulate_section_memmap(pfn, nr_pages, altmap, nid); else if (memmap) free_map_bootmem(memmap); @@ -865,7 +867,7 @@ static struct page * __meminit section_a memmap = populate_section_memmap(pfn, nr_pages, nid, altmap, pgmap); if (!memmap) { - section_deactivate(pfn, nr_pages, altmap); + section_deactivate(pfn, nr_pages, altmap, nid); return ERR_PTR(-ENOMEM); } @@ -928,13 +930,13 @@ int __meminit sparse_add_section(int nid } void sparse_remove_section(unsigned long pfn, unsigned long nr_pages, - struct vmem_altmap *altmap) + struct vmem_altmap *altmap, int nid) { struct mem_section *ms = __pfn_to_section(pfn); if (WARN_ON_ONCE(!valid_section(ms))) return; - section_deactivate(pfn, nr_pages, altmap); + section_deactivate(pfn, nr_pages, altmap, nid); } #endif /* CONFIG_MEMORY_HOTPLUG */ --- a/mm/sparse-vmemmap.c~mm-keep-nid-around-during-hot-remove +++ a/mm/sparse-vmemmap.c @@ -469,12 +469,14 @@ struct page * __meminit __populate_secti if (r < 0) return NULL; - if (system_state == SYSTEM_BOOTING) { - mod_node_early_perpage_metadata(nid, DIV_ROUND_UP(end - start, - PAGE_SIZE)); - } else { - mod_node_page_state(NODE_DATA(nid), NR_MEMMAP, - DIV_ROUND_UP(end - start, PAGE_SIZE)); + if (nid != NUMA_NO_NODE) { + if (system_state == SYSTEM_BOOTING) { + mod_node_early_perpage_metadata(nid, DIV_ROUND_UP(end - start, + PAGE_SIZE)); + } else { + mod_node_page_state(NODE_DATA(nid), NR_MEMMAP, + DIV_ROUND_UP(end - start, PAGE_SIZE)); + } } return pfn_to_page(pfn); _ Patches currently in -mm which might be from pasha.tatashin@xxxxxxxxxx are mm-update-the-memmap-stat-before-page-is-freed.patch mm-keep-nid-around-during-hot-remove.patch memcg-increase-the-valid-index-range-for-memcg-stats-v5.patch vmstat-kernel-stack-usage-histogram.patch task_stack-uninline-stack_not_used.patch