Hi Jiri, On Sat, Nov 25, 2023 at 3:55 AM Jiri Bohac <jbohac@xxxxxxx> wrote: > > Hi, > > this series implements a new way to reserve additional crash kernel > memory using CMA. > > Currently, all the memory for the crash kernel is not usable by > the 1st (production) kernel. It is also unmapped so that it can't > be corrupted by the fault that will eventually trigger the crash. > This makes sense for the memory actually used by the kexec-loaded > crash kernel image and initrd and the data prepared during the > load (vmcoreinfo, ...). However, the reserved space needs to be > much larger than that to provide enough run-time memory for the > crash kernel and the kdump userspace. Estimating the amount of > memory to reserve is difficult. Being too careful makes kdump > likely to end in OOM, being too generous takes even more memory > from the production system. Also, the reservation only allows > reserving a single contiguous block (or two with the "low" > suffix). I've seen systems where this fails because the physical > memory is fragmented. > > By reserving additional crashkernel memory from CMA, the main > crashkernel reservation can be just small enough to fit the > kernel and initrd image, minimizing the memory taken away from > the production system. Most of the run-time memory for the crash > kernel will be memory previously available to userspace in the > production system. As this memory is no longer wasted, the > reservation can be done with a generous margin, making kdump more > reliable. Kernel memory that we need to preserve for dumping is > never allocated from CMA. User data is typically not dumped by > makedumpfile. When dumping of user data is intended this new CMA > reservation cannot be used. > Thanks for the idea of using CMA as part of memory for the 2nd kernel. However I have a question: What if there is on-going DMA/RDMA access on the CMA range when 1st kernel crash? There might be data corruption when 2nd kernel and DMA/RDMA write to the same place, how to address such an issue? Thanks, Tao Liu > There are four patches in this series: > > The first adds a new ",cma" suffix to the recenly introduced generic > crashkernel parsing code. parse_crashkernel() takes one more > argument to store the cma reservation size. > > The second patch implements reserve_crashkernel_cma() which > performs the reservation. If the requested size is not available > in a single range, multiple smaller ranges will be reserved. > > The third patch enables the functionality for x86 as a proof of > concept. There are just three things every arch needs to do: > - call reserve_crashkernel_cma() > - include the CMA-reserved ranges in the physical memory map > - exclude the CMA-reserved ranges from the memory available > through /proc/vmcore by excluding them from the vmcoreinfo > PT_LOAD ranges. > Adding other architectures is easy and I can do that as soon as > this series is merged. > > The fourth patch just updates Documentation/ > > Now, specifying > crashkernel=100M craskhernel=1G,cma > on the command line will make a standard crashkernel reservation > of 100M, where kexec will load the kernel and initrd. > > An additional 1G will be reserved from CMA, still usable by the > production system. The crash kernel will have 1.1G memory > available. The 100M can be reliably predicted based on the size > of the kernel and initrd. > > When no crashkernel=size,cma is specified, everything works as > before. > > -- > Jiri Bohac <jbohac@xxxxxxx> > SUSE Labs, Prague, Czechia > > > _______________________________________________ > kexec mailing list > kexec@xxxxxxxxxxxxxxxxxxx > http://lists.infradead.org/mailman/listinfo/kexec > _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec