Hi David, On 3/27/20 9:59 AM, David Hildenbrand wrote: > On 26.03.20 19:07, James Morse wrote: >> Memory added to the system by hotplug has a 'System RAM' resource created >> for it. This is exposed to user-space via /proc/iomem. >> >> This poses problems for kexec on arm64. If kexec decides to place the >> kernel in one of these newly onlined regions, the new kernel will find >> itself booting from a region not described as memory in the firmware >> tables. >> >> Arm64 doesn't have a structure like the e820 memory map that can be >> re-written when memory is brought online. Instead arm64 uses the UEFI >> memory map, or the memory node from the DT, sometimes both. We never >> rewrite these. >> >> Allow an architecture to specify a different name for these hotplug >> regions. >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 0a54ffac8c68..69b03dd7fc74 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -42,6 +42,10 @@ >> #include "internal.h" >> #include "shuffle.h" >> >> +#ifndef MEMORY_HOTPLUG_RES_NAME >> +#define MEMORY_HOTPLUG_RES_NAME "System RAM" >> +#endif > > So I assume changing this for all architectures would result in some > user space tool breaking? Are we aware of any? Last time we had to touch arm64's /proc/iomem strings I went through debian's codesearch for stuff that reads it, kexec-tools was the only thing that parsed it in anger. (From memory, the other tools were looking for PCIe windows to do firmware flashing..) Looking again, having qualifiers on the end of 'System RAM' looks like it could confuse 's390-tools's detect_mem_chunks parser. It looks like the strings that come out of 'FIRMWARE_MEMMAP' are a duplicate set. > I do wonder if we should simply change it for all architectures if possible. If its possible that would be great. But I suspect that ship has sailed, changing it on other architectures could break some fragile parsing code. I'm wary of changing it on arm64, the only thing that makes it tolerable is that memory hot-add was relatively recently merged, and we don't anticipate it being widely used until you can remove memory as well. Changing it on arm64 is to prevent today's versions of kexec-tools from accidentally placing the new kernel in memory that wasn't described at boot. This leads to an unhandled exception during boot[0] because the kernel can't access itself via the mapping of all memory. (hotpluggable regions are only discovered by suitably configured ACPI systems much later) Thanks, James [0] | NUMA: NODE_DATA [mem 0x7fdf1780-0x7fdf3fff] | Unable to handle kernel paging request at virtual address ffff00004230aff8 | Mem abort info: | ESR = 0x96000006 | EC = 0x25: DABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | Data abort info: | ISV = 0, ISS = 0x00000006 | CM = 0, WnR = 0 | swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000008181d000 | [ffff00004230aff8] pgd=000000007fff9003, pud=000000007fdf7003, pmd=0000000000000000 | Internal error: Oops: 96000006 [#1] PREEMPT SMP | Modules linked in: | CPU: 0 PID: 0 Comm: swapper Not tainted 5.6.0-rc3-00098-g3f6c690f5dfe #11618 | Hardware name: linux,dummy-virt (DT) | pstate: 80400085 (Nzcv daIf +PAN -UAO BTYPE=--) | pc : vmemmap_pud_populate+0x2c/0xa0 | lr : vmemmap_populate+0x78/0x154 | Call trace: | vmemmap_pud_populate+0x2c/0xa0 | vmemmap_populate+0x78/0x154 | __populate_section_memmap+0x3c/0x60 | sparse_init_nid+0x29c/0x414 | sparse_init+0x154/0x170 | bootmem_init+0x78/0xdc | setup_arch+0x280/0x5d0 | start_kernel+0x98/0x4f8 | Code: f9469a84 92748e73 8b010e61 cb040033 (f9400261) | random: get_random_bytes called from print_oops_end_marker+0x34/0x60 with crng_init=0 | ---[ end trace 0000000000000000 ]--- | Kernel panic - not syncing: Attempted to kill the idle task! | ---[ end Kernel panic - not syncing: Attempted to kill the idle task!