Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

Tao Liu <ltao@xxxxxxxxxx> · Sat, 25 Nov 2023 09:51:54 +0800

Hi Jiri,

On Sat, Nov 25, 2023 at 3:55 AM Jiri Bohac <jbohac@xxxxxxx> wrote:
>
> Hi,
>
> this series implements a new way to reserve additional crash kernel
> memory using CMA.
>
> Currently, all the memory for the crash kernel is not usable by
> the 1st (production) kernel. It is also unmapped so that it can't
> be corrupted by the fault that will eventually trigger the crash.
> This makes sense for the memory actually used by the kexec-loaded
> crash kernel image and initrd and the data prepared during the
> load (vmcoreinfo, ...). However, the reserved space needs to be
> much larger than that to provide enough run-time memory for the
> crash kernel and the kdump userspace. Estimating the amount of
> memory to reserve is difficult. Being too careful makes kdump
> likely to end in OOM, being too generous takes even more memory
> from the production system. Also, the reservation only allows
> reserving a single contiguous block (or two with the "low"
> suffix). I've seen systems where this fails because the physical
> memory is fragmented.
>
> By reserving additional crashkernel memory from CMA, the main
> crashkernel reservation can be just small enough to fit the
> kernel and initrd image, minimizing the memory taken away from
> the production system. Most of the run-time memory for the crash
> kernel will be memory previously available to userspace in the
> production system. As this memory is no longer wasted, the
> reservation can be done with a generous margin, making kdump more
> reliable. Kernel memory that we need to preserve for dumping is
> never allocated from CMA. User data is typically not dumped by
> makedumpfile. When dumping of user data is intended this new CMA
> reservation cannot be used.
>

Thanks for the idea of using CMA as part of memory for the 2nd kernel.
However I have a question:

What if there is on-going DMA/RDMA access on the CMA range when 1st
kernel crash? There might be data corruption when 2nd kernel and
DMA/RDMA write to the same place, how to address such an issue?

Thanks,
Tao Liu

> There are four patches in this series:
>
> The first adds a new ",cma" suffix to the recenly introduced generic
> crashkernel parsing code. parse_crashkernel() takes one more
> argument to store the cma reservation size.
>
> The second patch implements reserve_crashkernel_cma() which
> performs the reservation. If the requested size is not available
> in a single range, multiple smaller ranges will be reserved.
>
> The third patch enables the functionality for x86 as a proof of
> concept. There are just three things every arch needs to do:
> - call reserve_crashkernel_cma()
> - include the CMA-reserved ranges in the physical memory map
> - exclude the CMA-reserved ranges from the memory available
>   through /proc/vmcore by excluding them from the vmcoreinfo
>   PT_LOAD ranges.
> Adding other architectures is easy and I can do that as soon as
> this series is merged.
>
> The fourth patch just updates Documentation/
>
> Now, specifying
>         crashkernel=100M craskhernel=1G,cma
> on the command line will make a standard crashkernel reservation
> of 100M, where kexec will load the kernel and initrd.
>
> An additional 1G will be reserved from CMA, still usable by the
> production system. The crash kernel will have 1.1G memory
> available. The 100M can be reliably predicted based on the size
> of the kernel and initrd.
>
> When no crashkernel=size,cma is specified, everything works as
> before.
>
> --
> Jiri Bohac <jbohac@xxxxxxx>
> SUSE Labs, Prague, Czechia
>
>
> _______________________________________________
> kexec mailing list
> kexec@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/kexec
>

_______________________________________________
kexec mailing list
kexec@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/kexec