The Problem: ============ Post CPU/Memory hot plug/unplug and online/offline events occur, the kdump kernel often retains outdated system information. This presents a significant challenge when attempting to perform a dump collection using an outdated or stale kdump kernel. In such situations, there are two potential outcomes that pose risks: either the dump collection fails to capture the required data entirely, leading to a failed dump, or the collected dump data is inaccurate, thereby compromising its reliability for analysis and troubleshooting purposes Existing solution: ================== The existing solution to keep the kdump kernel up-to-date involves monitoring CPU/Memory hotplug/online/offline events via a udev rule. This approach triggers a full kdump kernel reload for each hotplug event, ensuring that the kdump kernel is always synchronized with the latest system resource changes. Shortcomings of existing solution: ================================== - Leaves a window where kernel crash might not lead to a successful dump collection. - Reloading all kexec segments for each hotplug is inefficient. - udev rules are prone to races if hotplug events are frequent. Further information regarding the problems associated with a current solution can be found here. - https://lore.kernel.org/lkml/b04ed259-dc5f-7f30-6661-c26f92d9096a@xxxxxxxxxx/ - https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-February/240254.html Proposed Solution: ================== To address the limitations of the current approach, a proposed solution focuses on implementing a more targeted update strategy. Instead of performing a full reload of all kexec segments for every CPU/Memory hot plug/unplug and online/offline events, the proposed solution aims to update only the relevant kexec segment. After loading the kexec segments into the reserved area, a newly introduced hotplug handler will be responsible for updating the specific kexec segment based on the type of hotplug event. This selective update approach enhances overall efficiency by minimizing unnecessary overhead and significantly reduces the chances of a kernel crash leading to a failed or inaccurate dump collection. Series Dependencies: ==================== The implementation of the crash hotplug handler on PowerPC is included in this patch series. The introduction of the generic crash hotplug handler is done through the patch series available at https://lore.kernel.org/all/20230612210712.683175-1-eric.devolder@xxxxxxxxxx/ Git tree for testing: ===================== The following Git tree incorporates this patch series applied on top of the dependent patch series. https://github.com/sourabhjains/linux/tree/e23-s11-with-kexec-config In order to enable this feature, it is necessary to disable the udev rule responsible for reloading the kdump service. To do this, you can make the following additions to the file "/usr/lib/udev/rules.d/98-kexec.rules" on RHEL: Add the following two lines at top: SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" The changes mentioned above ensure that the kdump reload process is skipped for CPU/Memory hot plug/unplug events when the path "/sys/devices/system/[cpu|memory]/crash_hotplug" exists. Note: only kexec_file_load syscall will work. For kexec_load minor changes are required in kexec tool. --- Changelog: v11: - Rebase to v6.4-rc6 - The patch that introduced CONFIG_CRASH_HOTPLUG for PowerPC has been removed. The config is now part of common configuration: https://lore.kernel.org/all/87ilbpflsk.fsf@mail.lhotse/ v10: - Drop the patch that adds fdt_index attribute to struct kimage_arch Find the fdt segment index when needed. - Added more details into commits messages. - Rebased onto 6.3.0-rc5 v9: - Removed patch to prepare elfcorehdr crash notes for possible CPUs. The patch is moved to generic patch series that introduces generic infrastructure for in kernel crash update. - Removed patch to pass the hotplug action type to the arch crash hotplug handler function. The generic patch series has introduced the hotplug action type in kimage struct. - Add detail commit message for better understanding. v8: - Restrict fdt_index initialization to machine_kexec_post_load it work for both kexec_load and kexec_file_load.[3/8] Laurent Dufour - Updated the logic to find the number of offline core. [6/8] - Changed the logic to find the elfcore program header to accommodate future memory ranges due memory hotplug events. [8/8] v7 - added a new config to configure this feature - pass hotplug action type to arch specific handler v6 - Added crash memory hotplug support v5: - Replace COFNIG_CRASH_HOTPLUG with CONFIG_HOTPLUG_CPU. - Move fdt segment identification for kexec_load case to load path instead of crash hotplug handler - Keep new attribute defined under kimage_arch to track FDT segment under CONFIG_HOTPLUG_CPU config. v4: - Update the logic to find the additional space needed for hotadd CPUs post kexec load. Refer "[RFC v4 PATCH 4/5] powerpc/crash hp: add crash hotplug support for kexec_file_load" patch to know more about the change. - Fix a couple of typo. - Replace pr_err to pr_info_once to warn user about memory hotplug support. - In crash hotplug handle exit the for loop if FDT segment is found. v3 - Move fdt_index and fdt_index_vaild variables to kimage_arch struct. - Rebase patche on top of https://lore.kernel.org/lkml/20220303162725.49640-1-eric.devolder@xxxxxxxxxx/ - Fixed warning reported by checpatch script v2: - Use generic hotplug handler introduced by https://lore.kernel.org/lkml/20220209195706.51522-1-eric.devolder@xxxxxxxxxx/ a significant change from v1. Sourabh Jain (4): powerpc/kexec: turn some static helper functions public powerpc/crash: add crash CPU hotplug support crash: forward memory_notify args to arch crash hotplug handler powerpc/crash: add crash memory hotplug support arch/powerpc/Kconfig | 3 + arch/powerpc/include/asm/kexec.h | 22 ++ arch/powerpc/include/asm/kexec_ranges.h | 1 + arch/powerpc/kexec/core_64.c | 301 ++++++++++++++++++++++++ arch/powerpc/kexec/elf_64.c | 12 +- arch/powerpc/kexec/file_load_64.c | 212 ++++------------- arch/powerpc/kexec/ranges.c | 85 +++++++ arch/x86/include/asm/kexec.h | 2 +- arch/x86/kernel/crash.c | 5 +- include/linux/kexec.h | 2 +- kernel/crash_core.c | 14 +- 11 files changed, 483 insertions(+), 176 deletions(-) -- 2.40.1 _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec