On Sun, Jan 26, 2025 at 3:04 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Sat, Jan 25, 2025 at 10:19:51AM -0500, Pasha Tatashin wrote: > > > One way to solve that is pre-reserving space for the KHO tree - > > ideally a reasonable amount, perhaps 32-64 MB and allocating it at > > kexec load time. > > Why is there any weird limit? Setting a limit for KHO trees is similar to the limit we set for the scratch area; we can overrun both. It is just one simple way to ensure serialization is possible after kexec load, but there are obviously other ways to solve this problem." > We are preserving hudreds of GB of pages > backing the VM and more. There is endless memory being preserved across? There are other ways to do that, but even with this limit, I do not see this as an issue. The gigabytes of pages backing VMs would not be scattered as individual 4K pages; that's simply inefficient. The number of physical ranges is going to be small. If the preserved data is so large that it cannot fit into a reasonably sized tree, then I claim that the data should not be saved directly in the tree. Instead, it should have its own metadata that is pointed to from the tree. Alternatively, we could allow allocate FDT tree during kernel shutdown time. At that time there should be plenty of free memory as we already finished with userland. However, we have to be careful to allocate from memory that does not overlap the area where kernel segments and initramfs are going to be relocated. > So why are we trying to shoehorn a bunch of KHO stuff into the DT? > Shouldn't the DT just have a small KHO info pointing to the real KHO > memory in normal pages? Yes, for entities like file systems, there absolutely should be a small KHO info entry pointing to metadata pages that preserve the normal pages. However, for devices that are kept alive, most of the data should be saved directly in the tree, unless there is a large sparse soft state that must be carried for some reason (i.e. network flows or something similar) > Even if you want to re-use DT as some kind of serializing scheme in > drivers the DT framework can let each driver build its own tree, > serialize it to its own memory and then just link a pointer to that > tree. > > Also, I'm not sure forcing using DT as a serializing scheme is a great > idea. It is complicated and doesn't do that much to solve the complex > versioning problem drivers face here.. The primary goal of the KHO device tree is to standardize the live-update metadata that drivers preserve to maintain device functionality across reboots. We will document this using the YAML binding format, similar to our current approach for cold boot and getting device tree from firmware. Otherwise, we could just use other methods such as PKRAM where it no inherent standardization involved, but that allows to serialize devices absolutely during any phase of reboot. Pasha