On Thu, Mar 20, 2025 at 02:40:09AM +0000, Pasha Tatashin wrote: > Introduces the Live Update Orchestrator (LUO), a new kernel subsystem > designed to facilitate live updates. Live update is a method to reboot > the kernel while attempting to keep selected devices alive across the > reboot boundary, minimizing downtime. > > The primary use case is cloud environments, allowing hypervisor updates > without fully disrupting running virtual machines. VMs can be suspended > while the hypervisor kernel reboots, and devices attached to these VM > are kept operational by the LUO. > > Features introduced: > > - Core orchestration logic for managing the live update process. > - A state machine (NORMAL, PREPARED, UPDATED, *_FAILED) to track > the progress of live updates. > - Notifier chains for subsystems (device layer, interrupts, KVM, IOMMU, > etc.) to register callbacks for different live update events: > - LIVEUPDATE_PREPARE: Prepare for reboot (before blackout). > - LIVEUPDATE_REBOOT: Final serialization before kexec (blackout). > - LIVEUPDATE_FINISH: Cleanup after update (after blackout). > - LIVEUPDATE_CANCEL: Rollback actions on failure or user request. I still don't think notifier chains are the right way to go about alot of this, most if it should be driven off of the file descriptors and fdbox, not through notification. At the very least we should not be adding notifier chains without a clear user of them, and I'm not convinced that the iommu driver or vfio are those users at the moment. I feel more like the iommu can be brought into the serialization indirectly by putting an iommufd into a fdbox. > - A sysfs interface (/sys/kernel/liveupdate/) for user-space control: > - `prepare`: Initiate preparation (write 1) or reset (write 0). > - `finish`: Finalize update in new kernel (write 1). > - `cancel`: Abort ongoing preparation or reboot (write 1). > - `reset`: Force state back to normal (write 1). > - `state`: Read-only view of the current LUO state. > - `enabled`: Read-only view of whether live update is enabled. I also think we should give up on the sysfs. If fdbox is going forward in a char dev direction then I think we should have two char devs /dev/kho/serialize and /dev/kho/deserialize and run the whole thing through that. The concepts shown in the fdbox patches should be merged into the kho/serialize char dev as just a general architecture of open the char dev, put stuff into it, then finalize and do the kexec. It gives you more options to avoid things like notifiers and a very clear "session" linked to a FD lifetime that encloses the serialization effort. I think that will make error case cleanup easier and the whole thing more maintainable. IMHO sysfs is not a great API choice for something so complicated. Also agree with Greg, I think this needs more thoughtful patch staging with actual complete solutions. I think focusing on a progression of demonstrable kexec preservation: - A simple KVM and the VM's backing memory in a memfd is perserved - A simple vfio-noiommu doing DMA to a preserved memfd, including not resetting the device (but with no iommu driver) - iommufd This all builds on each other and introduces API along with concrete and meaningful use cases. I see alot of confusion in the various review comments in KHO work that I think mis understands the scope of what would be brought into this. It is not hundreds of FDs or hundreds of devices, but a very very narrow and selective set that can work like this. Showing each step along the way would help narrow the thinking. Jason