On Thu, Feb 06, 2025 at 03:27:45PM +0200, Mike Rapoport wrote: > diff --git a/Documentation/ABI/testing/sysfs-kernel-kho b/Documentation/ABI/testing/sysfs-kernel-kho > new file mode 100644 > index 000000000000..f13b252bc303 > --- /dev/null > +++ b/Documentation/ABI/testing/sysfs-kernel-kho > @@ -0,0 +1,53 @@ > +What: /sys/kernel/kho/active > +Date: December 2023 > +Contact: Alexander Graf <graf@xxxxxxxxxx> > +Description: > + Kexec HandOver (KHO) allows Linux to transition the state of > + compatible drivers into the next kexec'ed kernel. To do so, > + device drivers will serialize their current state into a DT. > + While the state is serialized, they are unable to perform > + any modifications to state that was serialized, such as > + handed over memory allocations. > + > + When this file contains "1", the system is in the transition > + state. When contains "0", it is not. To switch between the > + two states, echo the respective number into this file. I don't think this is a great interface for the actual state machine.. > +What: /sys/kernel/kho/dt_max > +Date: December 2023 > +Contact: Alexander Graf <graf@xxxxxxxxxx> > +Description: > + KHO needs to allocate a buffer for the DT that gets > + generated before it knows the final size. By default, it > + will allocate 10 MiB for it. You can write to this file > + to modify the size of that allocation. Seems gross, why can't it use a non-contiguous page list to generate the FDT? :\ See below for a suggestion.. > +static int kho_serialize(void) > +{ > + void *fdt = NULL; > + int err = -ENOMEM; > + > + fdt = kvmalloc(kho_out.dt_max, GFP_KERNEL); > + if (!fdt) > + goto out; > + > + if (fdt_create(fdt, kho_out.dt_max)) { > + err = -EINVAL; > + goto out; > + } > + > + err = fdt_finish_reservemap(fdt); > + if (err) > + goto out; > + > + err = fdt_begin_node(fdt, ""); > + if (err) > + goto out; > + > + err = fdt_property_string(fdt, "compatible", "kho-v1"); > + if (err) > + goto out; > + > + /* Loop through all kho dump functions */ > + err = blocking_notifier_call_chain(&kho_out.chain_head, KEXEC_KHO_DUMP, fdt); > + err = notifier_to_errno(err); I don't see this really working long term. I think we'd like each component to be able to serialize at its own pace under userspace control. This design requires that the whole thing be wrapped in a notifier callback just so we can make use of the fdt APIs. It seems like a poor fit me. IMHO if you want to keep using FDT I suggest that each serializing component (ie driver, ftrace whatever) allocate its own FDT fragment from scratch and the main KHO one just link to the memories that holds those fragements. Ie the driver experience would be more like kho = kho_start_storage("my_compatible_string,v1", some_kind_of_instance_key); fdt...(kho->fdt..) kho_finish_storage(kho); Where this ends up creating a stand alone FDT fragment: /dts-v1/; / { compatible = "linux-kho,my_compatible_string,v1"; instance = some_kind_of_instance_key; key-value-1 = <..>; key-value-1 = <..>; }; And then kho_finish_storage() would remember the phys/length until the kexec fdt is produced as the very last step. This way we could do things like fdbox an iommufd and create the above FDT fragment completely seperately from any notifier chain and, crucially, disconnected from the fdt_create() for the kexec payload. Further, if you split things like this (it will waste some small amount of memory) you can probably get to a point where no single FDT is more than 4k. That looks like it would simplify/robustify alot of stuff? Jason