The Problem: ============ Post hotplug/DLPAR events the capture kernel holds stale information about the system. Dump collection with stale capture kernel might end up in dump capture failure or an inaccurate dump collection. Existing solution: ================== The existing solution to keep the capture kernel up-to-date by monitoring hotplug event via udev rule and trigger a full capture kernel reload for every hotplug event. Shortcomings: ------------------------------------------------ - Leaves a window where kernel crash might not lead to a successful dump collection. - Reloading all kexec components for each hotplug is inefficient. - udev rules are prone to races if hotplug events are frequent. More about issues with an existing solution is posted here: - https://lkml.org/lkml/2020/12/14/532 - https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-February/240254.html Proposed Solution: ================== Instead of reloading all kexec segments on hotplug event, this patch series focuses on updating only the relevant kexec segment. Once the kexec segments are loaded in the kernel reserved area then an arch-specific hotplug handler will update the relevant kexec segment based on hotplug event type. Series Dependecies ================== This patch series implements the crash hotplug handler on PowerPC. The generic for crash hotplug update is introduced by https://lkml.org/lkml/2023/1/18/1420 patch series. Git tree for testing: ===================== The below git tree has this patch series applied on top of dependent patch series. https://github.com/sourabhjains/linux/commits/in-kernel-crash-update Note: only kexec_file_load syscall will work. For kexec_load mirnor changes are required in kexec tool. To realise the feature the kdump udev rules must be disabled for CPU/Memory hotplug events. Comment out the below line in kdump udev rule file: RHEL: /usr/lib/udev/rules.d/98-kexec.rules #SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_reload_cpu" #SUBSYSTEM=="memory", ACTION=="online", GOTO="kdump_reload_mem" #SUBSYSTEM=="memory", ACTION=="offline", GOTO="kdump_reload_mem" SLES: /usr/lib/kdump/70-kdump.rules #SUBSYSTEM=="memory", ACTION=="add|remove", GOTO="kdump_try_restart" #SUBSYSTEM=="cpu", ACTION=="online", GOTO="kdump_try_restart" --- Changelog: v7 -> v8: - Restrict fdt_index initialization to machine_kexec_post_load it work for both kexec_load and kexec_file_load.[3/8] Laurent Dufour - Updated the logic to find the number of offline core. [6/8] - Changed the logic to find the elfcore program header to accommodate future memory ranges due memory hotplug events. [8/8] v6 -> v7 - added a new config to configure this feature - pass hotplug action type to arch specific handler v5 -> v6 - Added crash memory hotplug support v4 -> v5: - Replace COFNIG_CRASH_HOTPLUG with CONFIG_HOTPLUG_CPU. - Move fdt segment identification for kexec_load case to load path instead of crash hotplug handler - Keep new attribute defined under kimage_arch to track FDT segment under CONFIG_HOTPLUG_CPU config. v3 -> v4: - Update the logic to find the additional space needed for hotadd CPUs post kexec load. Refer "[RFC v4 PATCH 4/5] powerpc/crash hp: add crash hotplug support for kexec_file_load" patch to know more about the change. - Fix a couple of typo. - Replace pr_err to pr_info_once to warn user about memory hotplug support. - In crash hotplug handle exit the for loop if FDT segment is found. v2 -> v3 - Move fdt_index and fdt_index_vaild variables to kimage_arch struct. - Rebase patche on top of https://lkml.org/lkml/2022/3/3/674 [v5] - Fixed warning reported by checpatch script v1 -> v2: - Use generic hotplug handler introduced by https://lkml.org/lkml/2022/2/9/1406, a significant change from v1. Sourabh Jain (8): powerpc/kexec: turn some static helper functions public powerpc/crash hp: introduce a new config option CRASH_HOTPLUG powerpc/crash: update kimage_arch struct crash: add phdr for possible CPUs in elfcorehdr crash: pass hotplug action type to arch crash hotplug handler powerpc/crash: add crash CPU hotplug support crash: forward memory_notify args to arch crash hotplug handler powerpc/kexec: add crash memory hotplug support arch/powerpc/Kconfig | 12 + arch/powerpc/include/asm/kexec.h | 18 ++ arch/powerpc/include/asm/kexec_ranges.h | 1 + arch/powerpc/kexec/core_64.c | 335 ++++++++++++++++++++++++ arch/powerpc/kexec/elf_64.c | 19 +- arch/powerpc/kexec/file_load_64.c | 237 +++-------------- arch/powerpc/kexec/ranges.c | 60 +++++ arch/x86/include/asm/kexec.h | 3 +- arch/x86/kernel/crash.c | 5 +- include/linux/kexec.h | 6 +- kernel/crash_core.c | 23 +- 11 files changed, 502 insertions(+), 217 deletions(-) -- 2.39.1 _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec