On Thu, Mar 20, 2025 at 9:36 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Mar 20, 2025 at 02:40:10AM +0000, Pasha Tatashin wrote: > > Introduce a new subsystem within the driver core to enable keeping > > devices alive during kernel live update. This infrastructure is > > designed to be registered with and driven by a separate Live Update > > Orchestrator, allowing the LUO's state machine to manage the save and > > restore process of device state during a kernel transition. > > > > The goal is to allow drivers and buses to participate in a coordinated > > save and restore process orchestrated by a live update mechanism. By > > saving device state before the kernel switch and restoring it > > immediately after, the device can appear to remain continuously > > operational from the perspective of the system and userspace. > > > > components introduced: > > > > - `struct dev_liveupdate`: Embedded in `struct device` to track the > > device's participation and state during a live update, including > > request status, preservation status, and dependency depth. > > > > - `liveupdate()` callback: Added to `struct bus_type` and > > `struct device_driver`. This callback receives an enum > > `liveupdate_event` to manage device state at different stages of the > > live update process: > > - LIVEUPDATE_PREPARE: Save device state before the kernel switch. > > - LIVEUPDATE_REBOOT: Final actions just before the kernel jump. > > - LIVEUPDATE_FINISH: Clean-up after live update. > > - LIVEUPDATE_CANCEL: Clean up any saved state if the update is > > aborted. > > > > - Sysfs attribute "liveupdate/requested": Added under each device > > directory, allowing user to request that a specific device to > > participate in live update. I.e. its state is to be preserved > > during the update. > > As you can imagine, I have "thoughts" about all of this being added to > the driver core. But, before I go off on that, I want to see some real, > actual, working, patches for at least 3 bus subsystems that correctly > implement this before I even consider reviewing this. > > Show us real users please, otherwise any attempt at reviewing this is > going to just be a waste of our time as I have doubts that this actually > even works :) > > Also, as you are adding a new user/kernel api, please also point at the > userspace tools that are written to handle all of this. As you are > going to be handling potentially tens of thousands of devices from > userspace this way, in a single system, real code is needed to even > consider that this is an acceptable solution. Hi Greg, Thanks for the feedback on this RFC. I understand your hesitation about adding this to the driver core without seeing concrete implementations. The primary goal of posting this RFC now is to get early feedback on the overall state machine and rules concept. We have a bi-weekly meeting [1] where the "Live Update Orchestrator" is scheduled for presentation. I wanted to give people a chance to look at the framework ahead of those discussions. Regarding your request for real, working patches, we are actively working on that. Our current efforts are focused on adding live update support for LUO for these subsystems: KVM, Interrupts, IOMMU, Devices Within the devices subsystem, we are targeting generic PCI, VFIO, and a few other device types (real and emulated) to demonstrate the implementation. I absolutely agree that demonstrating a real use case is important. However, this is a complicated project that involves changes in many parts of the kernel, and we can't deliver everything in one large patchset; it has to be divided and addressed incrementally. So far, we have the following pieces of the Live Update puzzle: KHO (for preserving kernel memory), LUO (for driving the live update process), and Dev_Liveupdate (for managing device participation in live update), IOMMU preservation [2], guest memory [3], and we are planning to add support for interrupts, PCIe, VFIO, some drivers, and other components. On the user side, we are planning to propose the necessary changes to VMMs such as CloudHypervisor and QEMU. Thanks, Pasha [1] https://lore.kernel.org/all/a350f3e5-e764-4ba6-f871-da7252f314da@xxxxxxxxxx [2] https://lpc.events/event/18/contributions/1686 [3] https://lore.kernel.org/all/20240805093245.889357-1-jgowans@xxxxxxxxxx