The patch titled Subject: mm,memory_hotplug: Fix shrink_{zone,node}_span has been removed from the -mm tree. Its filename was mm-sparsemem-support-sub-section-hotplug-fix-fix.patch This patch was dropped because an alternative patch was merged ------------------------------------------------------ From: Oscar Salvador <osalvador@xxxxxxx> Subject: mm,memory_hotplug: Fix shrink_{zone,node}_span Since [1], shrink_{zone,node}_span work on PAGES_PER_SUBSECTION granularity. The problem is that deactivation of the section occurs later on in sparse_remove_section, so pfn_valid()->pfn_section_valid() will always return true before we deactivate the {sub}section. I spotted this during hotplug hotremove tests, there I always saw that spanned_pages was, at least, left with PAGES_PER_SECTION, even if we removed all memory linked to that zone. Fix this by decoupling section_deactivate from sparse_remove_section, and re-order the function calls. Now, __remove_section will: 1) deactivate section 2) shrink {zone,node}'s pages 3) remove section [1] https://patchwork.kernel.org/patch/11003467/ Let us analyze what shrink_{zone,node}_span does. We have to remember that shrink_zone_span gets called every time a section is to be removed. There can be three possibilites: 1) section to be removed is the first one of the zone 2) section to be removed is the last one of the zone 3) section to be removed falls in the middle For 1) and 2) cases, we will try to find the next section from bottom/top, and in the third case we will check whether the section contains only holes. Now, let us take the example where a ZONE contains only 1 section, and we remove it. The last loop of shrink_zone_span, will check for {start_pfn,end_pfn] PAGES_PER_SECTION block the following: - section is valid - pfn relates to the current zone/nid - section is not the section to be removed Since we only got 1 section here, the check "start_pfn == pfn" will make us to continue the loop and then we are done. Now, what happens after the patch? We increment pfn on subsection basis, since "start_pfn == pfn", we jump to the next sub-section (pfn+512), and call pfn_valid()- >pfn_section_valid(). Since section has not been yet deactivded, pfn_section_valid() will return true, and we will repeat this until the end of the loop. What should happen instead is: - we deactivate the {sub}-section before calling shirnk_{zone,node}_span - calls to pfn_valid() will now return false for the sections that have been deactivated, and so we will get the pfn from the next activaded sub-section, or nothing if the section is empty (section do not contain active sub-sections). The example relates to the last loop in shrink_zone_span, but the same applies to find_{smalles,biggest}_section. Please note that we could probably do some hack like replacing: start_pfn == pfn with pfn < end_pfn But the way to fix this is to 1) deactivate {sub}-section and 2) let shrink_{node,zone}_span find the next active {sub-section}. Link: http://lkml.kernel.org/r/20190715081549.32577-3-osalvador@xxxxxxx Fixes: mmotm ("mm/hotplug: prepare shrink_{zone, pgdat}_span for sub-section removal") Signed-off-by: Oscar Salvador <osalvador@xxxxxxx> Cc: Dan Williams <dan.j.williams@xxxxxxxxx> Cc: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxx> Cc: Vlastimil Babka <vbabka@xxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- include/linux/memory_hotplug.h | 7 +- mm/memory_hotplug.c | 6 ++ mm/sparse.c | 77 +++++++++++++++++++++---------- 3 files changed, 62 insertions(+), 28 deletions(-) --- a/include/linux/memory_hotplug.h~mm-sparsemem-support-sub-section-hotplug-fix-fix +++ a/include/linux/memory_hotplug.h @@ -348,9 +348,10 @@ extern void move_pfn_range_to_zone(struc extern bool is_memblock_offlined(struct memory_block *mem); extern int sparse_add_section(int nid, unsigned long pfn, unsigned long nr_pages, struct vmem_altmap *altmap); -extern void sparse_remove_section(struct mem_section *ms, - unsigned long pfn, unsigned long nr_pages, - unsigned long map_offset, struct vmem_altmap *altmap); +int sparse_deactivate_section(unsigned long pfn, unsigned long nr_pages); +void sparse_remove_section(unsigned long pfn, unsigned long nr_pages, + unsigned long map_offset, struct vmem_altmap *altmap, + int section_empty); extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum); extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, --- a/mm/memory_hotplug.c~mm-sparsemem-support-sub-section-hotplug-fix-fix +++ a/mm/memory_hotplug.c @@ -517,12 +517,16 @@ static void __remove_section(struct zone struct vmem_altmap *altmap) { struct mem_section *ms = __nr_to_section(pfn_to_section_nr(pfn)); + int ret; if (WARN_ON_ONCE(!valid_section(ms))) return; + ret = sparse_deactivate_section(pfn, nr_pages); __remove_zone(zone, pfn, nr_pages); - sparse_remove_section(ms, pfn, nr_pages, map_offset, altmap); + if (ret >= 0) + sparse_remove_section(pfn, nr_pages, map_offset, altmap, + ret); } /** --- a/mm/sparse.c~mm-sparsemem-support-sub-section-hotplug-fix-fix +++ a/mm/sparse.c @@ -732,16 +732,47 @@ static void free_map_bootmem(struct page } #endif /* CONFIG_SPARSEMEM_VMEMMAP */ -static void section_deactivate(unsigned long pfn, unsigned long nr_pages, - struct vmem_altmap *altmap) +static void section_remove(unsigned long pfn, unsigned long nr_pages, + struct vmem_altmap *altmap, int section_empty) +{ + struct mem_section *ms = __pfn_to_section(pfn); + bool section_early = early_section(ms); + struct page *memmap = NULL; + + if (section_empty) { + unsigned long section_nr = pfn_to_section_nr(pfn); + + if (!section_early) { + kfree(ms->usage); + ms->usage = NULL; + } + memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr); + ms->section_mem_map = sparse_encode_mem_map(NULL, section_nr); + } + + if (section_early && memmap) + free_map_bootmem(memmap); + else + depopulate_section_memmap(pfn, nr_pages, altmap); +} + +/** + * section_deactivate: Deactivate a {sub}section. + * + * Return: + * * -1 - {sub}section has already been deactivated. + * * 0 - Section is not empty + * * 1 - Section is empty + */ + +static int section_deactivate(unsigned long pfn, unsigned long nr_pages) { DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 }; DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 }; struct mem_section *ms = __pfn_to_section(pfn); - bool section_is_early = early_section(ms); - struct page *memmap = NULL; unsigned long *subsection_map = ms->usage ? &ms->usage->subsection_map[0] : NULL; + int section_empty = 0; subsection_mask_set(map, pfn, nr_pages); if (subsection_map) @@ -750,7 +781,7 @@ static void section_deactivate(unsigned if (WARN(!subsection_map || !bitmap_equal(tmp, map, SUBSECTIONS_PER_SECTION), "section already deactivated (%#lx + %ld)\n", pfn, nr_pages)) - return; + return -1; /* * There are 3 cases to handle across two configurations @@ -770,21 +801,10 @@ static void section_deactivate(unsigned * For 2/ and 3/ the SPARSEMEM_VMEMMAP={y,n} cases are unified */ bitmap_xor(subsection_map, map, subsection_map, SUBSECTIONS_PER_SECTION); - if (bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION)) { - unsigned long section_nr = pfn_to_section_nr(pfn); + if (bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION)) + section_empty = 1; - if (!section_is_early) { - kfree(ms->usage); - ms->usage = NULL; - } - memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr); - ms->section_mem_map = sparse_encode_mem_map(NULL, section_nr); - } - - if (section_is_early && memmap) - free_map_bootmem(memmap); - else - depopulate_section_memmap(pfn, nr_pages, altmap); + return section_empty; } static struct page * __meminit section_activate(int nid, unsigned long pfn, @@ -834,7 +854,11 @@ static struct page * __meminit section_a memmap = populate_section_memmap(pfn, nr_pages, nid, altmap); if (!memmap) { - section_deactivate(pfn, nr_pages, altmap); + int ret; + + ret = section_deactivate(pfn, nr_pages); + if (ret >= 0) + section_remove(pfn, nr_pages, altmap, ret); return ERR_PTR(-ENOMEM); } @@ -919,12 +943,17 @@ static inline void clear_hwpoisoned_page } #endif -void sparse_remove_section(struct mem_section *ms, unsigned long pfn, - unsigned long nr_pages, unsigned long map_offset, - struct vmem_altmap *altmap) +int sparse_deactivate_section(unsigned long pfn, unsigned long nr_pages) +{ + return section_deactivate(pfn, nr_pages); +} + +void sparse_remove_section(unsigned long pfn, unsigned long nr_pages, + unsigned long map_offset, struct vmem_altmap *altmap, + int section_empty) { clear_hwpoisoned_pages(pfn_to_page(pfn) + map_offset, nr_pages - map_offset); - section_deactivate(pfn, nr_pages, altmap); + section_remove(pfn, nr_pages, altmap, section_empty); } #endif /* CONFIG_MEMORY_HOTPLUG */ _ Patches currently in -mm which might be from osalvador@xxxxxxx are mm-hotplug-prepare-shrink_zone-pgdat_span-for-sub-section-removal-fix.patch mm-sparsemem-support-sub-section-hotplug-fix.patch