On Mon, Feb 10, 2025 at 3:22 PM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > On Thu, Feb 06, 2025 at 03:27:45PM +0200, Mike Rapoport wrote: > > diff --git a/Documentation/ABI/testing/sysfs-kernel-kho b/Documentation/ABI/testing/sysfs-kernel-kho > > new file mode 100644 > > index 000000000000..f13b252bc303 > > --- /dev/null > > +++ b/Documentation/ABI/testing/sysfs-kernel-kho > > @@ -0,0 +1,53 @@ > > +What: /sys/kernel/kho/active > > +Date: December 2023 > > +Contact: Alexander Graf <graf@xxxxxxxxxx> > > +Description: > > + Kexec HandOver (KHO) allows Linux to transition the state of > > + compatible drivers into the next kexec'ed kernel. To do so, > > + device drivers will serialize their current state into a DT. > > + While the state is serialized, they are unable to perform > > + any modifications to state that was serialized, such as > > + handed over memory allocations. > > + > > + When this file contains "1", the system is in the transition > > + state. When contains "0", it is not. To switch between the > > + two states, echo the respective number into this file. > > I don't think this is a great interface for the actual state machine.. In our next proposal we are going to remove this "activate" phase. > > > +What: /sys/kernel/kho/dt_max > > +Date: December 2023 > > +Contact: Alexander Graf <graf@xxxxxxxxxx> > > +Description: > > + KHO needs to allocate a buffer for the DT that gets > > + generated before it knows the final size. By default, it > > + will allocate 10 MiB for it. You can write to this file > > + to modify the size of that allocation. > > Seems gross, why can't it use a non-contiguous page list to generate > the FDT? :\ We will consider some of these ideas in the future version. I like the idea of using preserved memory to carry sparse KHO tree: i.e FDT over sparse memory, maybe use the anchor page to describe how it should be vmapped into a virtually contiguous tree in the next kernel? > > See below for a suggestion.. > > > +static int kho_serialize(void) > > +{ > > + void *fdt = NULL; > > + int err = -ENOMEM; > > + > > + fdt = kvmalloc(kho_out.dt_max, GFP_KERNEL); > > + if (!fdt) > > + goto out; > > + > > + if (fdt_create(fdt, kho_out.dt_max)) { > > + err = -EINVAL; > > + goto out; > > + } > > + > > + err = fdt_finish_reservemap(fdt); > > + if (err) > > + goto out; > > + > > + err = fdt_begin_node(fdt, ""); > > + if (err) > > + goto out; > > + > > + err = fdt_property_string(fdt, "compatible", "kho-v1"); > > + if (err) > > + goto out; > > + > > + /* Loop through all kho dump functions */ > > + err = blocking_notifier_call_chain(&kho_out.chain_head, KEXEC_KHO_DUMP, fdt); > > + err = notifier_to_errno(err); > > I don't see this really working long term. I think we'd like each > component to be able to serialize at its own pace under userspace > control. > > This design requires that the whole thing be wrapped in a notifier > callback just so we can make use of the fdt APIs. > > It seems like a poor fit me. > > IMHO if you want to keep using FDT I suggest that each serializing > component (ie driver, ftrace whatever) allocate its own FDT fragment > from scratch and the main KHO one just link to the memories that holds > those fragements. > > Ie the driver experience would be more like > > kho = kho_start_storage("my_compatible_string,v1", some_kind_of_instance_key); > > fdt...(kho->fdt..) > > kho_finish_storage(kho); > > Where this ends up creating a stand alone FDT fragment: > > /dts-v1/; > / { > compatible = "linux-kho,my_compatible_string,v1"; > instance = some_kind_of_instance_key; > key-value-1 = <..>; > key-value-1 = <..>; > }; > > And then kho_finish_storage() would remember the phys/length until the > kexec fdt is produced as the very last step. > > This way we could do things like fdbox an iommufd and create the above > FDT fragment completely seperately from any notifier chain and, > crucially, disconnected from the fdt_create() for the kexec payload. > > Further, if you split things like this (it will waste some small > amount of memory) you can probably get to a point where no single FDT > is more than 4k. That looks like it would simplify/robustify alot of > stuff? > > Jason >