Re: [LSF/MM/BPF TOPIC] memory persistence over kexec

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 26.01.25 12:41, Pasha Tatashin wrote:
On Sun, Jan 26, 2025 at 3:04 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
On Sat, Jan 25, 2025 at 10:19:51AM -0500, Pasha Tatashin wrote:

One way to solve that is pre-reserving space for the KHO tree -
ideally a reasonable amount, perhaps 32-64 MB and allocating it at
kexec load time.
Why is there any weird limit?
Setting a limit for KHO trees is similar to the limit we set for the
scratch area; we can overrun both. It is just one simple way to ensure
serialization is possible after kexec load, but there are obviously
other ways to solve this problem."


The problem is not only with allocation. Kexec has 2 schemes: User space and kernel based file loading. In the latter, we can do whatever we like. In the former, the flow expects user space has ultimate control over placement of the future data blobs and their contents.

I like the flexibility this allows for. It means that user space can inject its own KHO data for example if it wants to. Or modify it. It will come in very handy for debugging and testing later.


We are preserving hudreds of GB of pages
backing the VM and more. There is endless memory being preserved across?
There are other ways to do that, but even with this limit, I do not
see this as an issue. The gigabytes of pages backing VMs would not be
scattered as individual 4K pages; that's simply inefficient. The
number of physical ranges is going to be small. If the preserved data
is so large that it cannot fit into a reasonably sized tree, then I
claim that the data should not be saved directly in the tree. Instead,
it should have its own metadata that is pointed to from the tree.


Correct :). The way I think of the KHO DT is as a uniform way to implement setup_data across kexec that is identical across all architectures, enforces review and structure to ensure we keep compatibility and generalizes memory reservation.

The alternative we have today are hacks like IMA: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/include/uapi/asm/setup_data.h#n73


Alternatively, we could allow allocate FDT tree during kernel shutdown
time. At that time there should be plenty of free memory as we already
finished with userland. However, we have to be careful to allocate
from memory that does not overlap the area where kernel segments and
initramfs are going to be relocated.


Yes, this is easier said than done. In the user space driven kexec path, user space is in control of memory locations. At least after the first kexec iteration, these locations will overlap with the existing Linux runtime environment, because both lie in the scratch region. Only the purgatory moves everything to where it should be.

Maybe we could create a special kexec memory type that means "KHO DT"?


Alex





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux