Re: [PATCH v3 0/6] crashdump: Kernel handling of CPU and memory hot un/plug

Simon Horman <horms@xxxxxxxxxx> · Wed, 4 Oct 2023 14:08:46 +0200

On Wed, Sep 27, 2023 at 02:11:30PM -0400, Eric DeVolder wrote:
> When the kdump service is loaded, if a CPU or memory is hot
> un/plugged, the crash elfcorehdr, which describes the CPUs and memory
> in the system, must also be updated, else the resulting vmcore is
> inaccurate (eg. missing either CPU context or memory regions).
> 
> The current solution utilizes udev (eg. RHEL /usr/lib/udev/rules.d/
> 98-kexec.rules) to initiate an unload-then-reload of the *entire* kdump
> image (eg. kernel, initrd, boot_params, purgatory and elfcorehdr) by
> the userspace kexec utility. This occurrs just so the elfcorehdr can
> be updated with the latest list of CPUs and memory regions. In a
> previous post I have outlined the significant performance problems
> related to offloading this activity to userspace.
> 
> With the Linux kernel 6.6 commit below, the kernel now has the ability
> to directly modify the elfcorehdr, eliminating the need to
> unload-then-reload the entire kdump image when CPU or memory is hot
> un/plugged or on/offlined.
> 
>  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d6
> 8b4b6f307d155475cce541f2aee938032ed22e
> 
> This kexec-tools patch series is for supporting hotplug with the
> kexec_load() syscall; the kernel directly supports hotplug for the
> kexec_file_load() syscall, requiring no userspace help.
> 
> There are two basic obstacles/requirements for the kexec-tools to
> overcome in order to support kernel hotplug rewriting of the
> elfcorehdr.
> 
> First, the buffer containing the elfcorehdr must be excluded from the
> purgatory checksum/digest, which is computed at load time. Otherwise
> kernel run-time changes to the elfcorehdr, as a result of hot un/plug,
> would result in the checksum failing (specifically in purgatory at
> panic kernel boot time), and kdump capture kernel failing to start.
> To let the kernel know it is okay to modify the elfcorehdr, kexec
> sets the KEXEC_UPDATE_ELFCOREHDR flag.
> 
> NOTE: The kernel specifically does *NOT* attempt to recompute the
> checksum/digest as that would ultimately require patching the in-
> memory purgatory image with the updated checksum. As that purgatory
> image is already fully linked, it is binary blob containing no ELF
> information which would allow it to be re-linked or patched. Thus
> excluding the elfcorehdr from the checksum/digests avoids all these
> problems.
> 
> Second, the size of the elfcorehdr buffer must be large enough
> to accomodate growth of the number of CPUs and/or memory regions.
> 
> To satisfy the first requirement, this patch series introduces the
> --hotplug option to indicate to kexec-tools that kexec should exclude
> the elfcorehdr buffer from the purgatory checksum/digest calculation
> and set the KEXEC_UPDATE_ELFCOREHDR flag.
> 
> To satisfy the second requirement, the size is obtained from the
> /sys/kernel/crash_elfcorehdr_size node (new with the kernel series
> cited above).
> 
> To use this feature with kexec_load() syscall, invoke kexec with:
> 
>  kexec -c --hotplug ...
> 
> Thanks!
> eric

Thanks Eric,

applied.

_______________________________________________
kexec mailing list
kexec@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/kexec