The function crash_prepare_elf64_headers() generates the elfcorehdr which describes the CPUs and memory in the system for the crash kernel. In particular, it writes out ELF PT_NOTEs for memory regions and the CPUs in the system. With respect to the CPUs, the current implementation utilizes for_each_present_cpu() which means that as CPUs are added and removed, the elfcorehdr must again be updated to reflect the new set of CPUs. The reasoning behind the move to use for_each_possible_cpu(), is: - At kernel boot time, all percpu crash_notes are allocated for all possible CPUs; that is, crash_notes are not allocated dynamically when CPUs are plugged/unplugged. Thus the crash_notes for each possible CPU are always available. - The crash_prepare_elf64_headers() creates an ELF PT_NOTE per CPU. Changing to for_each_possible_cpu() is valid as the crash_notes pointed to by each CPU PT_NOTE are present and always valid. Furthermore, examining a common crash processing path of: kernel panic -> crash kernel -> makedumpfile -> 'crash' analyzer elfcorehdr /proc/vmcore vmcore reveals how the ELF CPU PT_NOTEs are utilized: - Upon panic, each CPU is sent an IPI and shuts itself down, recording its state in its crash_notes. When all CPUs are shutdown, the crash kernel is launched with a pointer to the elfcorehdr. - The crash kernel via linux/fs/proc/vmcore.c does not examine or use the contents of the PT_NOTEs, it exposes them via /proc/vmcore. - The makedumpfile utility uses /proc/vmcore and reads the CPU PT_NOTEs to craft a nr_cpus variable, which is reported in a header but otherwise generally unused. Makedumpfile creates the vmcore. - The 'crash' dump analyzer does not appear to reference the CPU PT_NOTEs. Instead it looks-up the cpu_[possible|present|onlin]_mask symbols and directly examines those structure contents from vmcore memory. From that information it is able to determine which CPUs are present and online, and locate the corresponding crash_notes. Said differently, it appears that 'crash' analyzer does not rely on the ELF PT_NOTEs for CPUs; rather it obtains the information directly via kernel symbols and the memory within the vmcore. (There maybe other vmcore generating and analysis tools that do use these PT_NOTEs, but 'makedumpfile' and 'crash' seems to be the most common solution.) This results in the benefit of having all CPUs described in the elfcorehdr, and therefore reducing the need to re-generate the elfcorehdr on CPU changes, at the small expense of an additional 56 bytes per PT_NOTE for not-present-but-possible CPUs. On systems where kexec_file_load() syscall is utilized, all the above is valid. On systems where kexec_load() syscall is utilized, there may be the need for the elfcorehdr to be regenerated once. The reason being that some archs only populate the 'present' CPUs from the /sys/devices/system/cpus entries, which the userspace 'kexec' utility uses to generate the userspace-supplied elfcorehdr. In this situation, one memory or CPU change will rewrite the elfcorehdr via the crash_prepare_elf64_headers() function and now all possible CPUs will be described, just as with kexec_file_load() syscall. Suggested-by: Sourabh Jain <sourabhjain@xxxxxxxxxxxxx> Signed-off-by: Eric DeVolder <eric.devolder@xxxxxxxxxx> Reviewed-by: Sourabh Jain <sourabhjain@xxxxxxxxxxxxx> Acked-by: Hari Bathini <hbathini@xxxxxxxxxxxxx> Acked-by: Baoquan He <bhe@xxxxxxxxxx> --- kernel/crash_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index fa918176d46d..7378b501fada 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -364,8 +364,8 @@ int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map, ehdr->e_ehsize = sizeof(Elf64_Ehdr); ehdr->e_phentsize = sizeof(Elf64_Phdr); - /* Prepare one phdr of type PT_NOTE for each present CPU */ - for_each_present_cpu(cpu) { + /* Prepare one phdr of type PT_NOTE for each possible CPU */ + for_each_possible_cpu(cpu) { phdr->p_type = PT_NOTE; notes_addr = per_cpu_ptr_to_phys(per_cpu_ptr(crash_notes, cpu)); phdr->p_offset = phdr->p_paddr = notes_addr; -- 2.31.1