On Fri, Nov 17, 2023 at 04:37:29PM +0100, David Hildenbrand wrote: > > > > Maybe there is also already a common code bug with that, s390 might be > > special but that is often also good for finding bugs in common code ... > > If it's only the page_init_poison() as noted by Sumanth, we could disable > that on s390x with an altmap some way or the other; should be possible. > > I mean, you effectively have your own poisoning if the altmap is effectively > inaccessible and makes your CPU angry on access :) > > Last but not least, support for an inaccessible altmap might come in handy > for virtio-mem eventually, and make altmap support eventually simpler. So > added bonus points. We tried out two possibilities dealing with vmemmap altmap inaccessibilty. Approach 1: Add MHP_ALTMAP_INACCESSIBLE flag and pass it in add_memory() diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c index 075094ca59b4..ab2dfcc7e9e4 100644 --- a/drivers/s390/char/sclp_cmd.c +++ b/drivers/s390/char/sclp_cmd.c @@ -358,6 +358,13 @@ static int sclp_mem_notifier(struct notifier_block *nb, * buddy allocator later. */ __arch_set_page_nodat((void *)__va(start), memory_block->altmap->free); + /* + * Poison the struct pages after memory block is accessible. + * This is needed for only altmap. Without altmap, the struct + * pages are poisoined in sparse_add_section(). + */ + if (memory_block->altmap->inaccessible) + page_init_poison(pfn_to_page(arg->start_pfn), memory_block->altmap->free); break; case MEM_FINISH_OFFLINE: sclp_mem_change_state(start, size, 0); @@ -412,7 +419,7 @@ static void __init add_memory_merged(u16 rn) goto skip_add; for (addr = start; addr < start + size; addr += block_size) add_memory(0, addr, block_size, - MACHINE_HAS_EDAT1 ? MHP_MEMMAP_ON_MEMORY : MHP_NONE); + MACHINE_HAS_EDAT1 ? MHP_MEMMAP_ON_MEMORY|MHP_ALTMAP_INACCESSIBLE : MHP_NONE); skip_add: first_rn = rn; num = 1; diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 7d2076583494..5c70707e706f 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -106,6 +106,11 @@ typedef int __bitwise mhp_t; * implies the node id (nid). */ #define MHP_NID_IS_MGID ((__force mhp_t)BIT(2)) +/* + * Mark memmap on memory (struct pages array) as inaccessible during memory + * hotplug addition phase. + */ +#define MHP_ALTMAP_INACCESSIBLE ((__force mhp_t)BIT(3)) /* * Extended parameters for memory hotplug: diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 744c830f4b13..9837f3e6fb95 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -25,6 +25,7 @@ struct vmem_altmap { unsigned long free; unsigned long align; unsigned long alloc; + bool inaccessible; }; /* diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 7a5fc89a8652..d8299853cdcc 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1439,6 +1439,8 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags) if (mhp_flags & MHP_MEMMAP_ON_MEMORY) { if (mhp_supports_memmap_on_memory(size)) { mhp_altmap.free = memory_block_memmap_on_memory_pages(); + if (mhp_flags & MHP_ALTMAP_INACCESSIBLE) + mhp_altmap.inaccessible = true; params.altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL); if (!params.altmap) { ret = -ENOMEM; diff --git a/mm/sparse.c b/mm/sparse.c index 77d91e565045..3991c717b769 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -907,7 +907,8 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn, * Poison uninitialized struct pages in order to catch invalid flags * combinations. */ - page_init_poison(memmap, sizeof(struct page) * nr_pages); + if (!altmap || !altmap->inaccessible) + page_init_poison(memmap, sizeof(struct page) * nr_pages); ms = __nr_to_section(section_nr); set_section_nid(section_nr, nid); Approach 2: =========== Shouldnt kasan zero shadow mapping performed first before accessing/initializing memmap via page_init_poisining()? If that is true, then it is a problem for all architectures and should could be fixed like: diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 7a5fc89a8652..eb3975740537 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1093,6 +1093,7 @@ int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, if (ret) return ret; + page_init_poison(pfn_to_page(pfn), sizeof(struct page) * nr_pages); move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_UNMOVABLE); for (i = 0; i < nr_pages; i++) diff --git a/mm/sparse.c b/mm/sparse.c index 77d91e565045..4ddf53f52075 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -906,8 +906,11 @@ int __meminit sparse_add_section(int nid, unsigned long start_pfn, /* * Poison uninitialized struct pages in order to catch invalid flags * combinations. + * For altmap, do this later when onlining the memory, as it might + * not be accessible at this point. */ - page_init_poison(memmap, sizeof(struct page) * nr_pages); + if (!altmap) + page_init_poison(memmap, sizeof(struct page) * nr_pages); ms = __nr_to_section(section_nr); set_section_nid(section_nr, nid); Also, if this approach is taken, should page_init_poison() be performed with cond_resched() as mentioned in commit d33695b16a9f ("mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()") ? Opinions? Thank you