Re: [RFC v1 1/3] luo: Live Update Orchestrator

Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> · Thu, 20 Mar 2025 15:00:31 -0400

Hi Jason,

Thank you for your feedback.

> > Features introduced:
> >
> > - Core orchestration logic for managing the live update process.
> > - A state machine (NORMAL, PREPARED, UPDATED, *_FAILED) to track
> >   the progress of live updates.
> > - Notifier chains for subsystems (device layer, interrupts, KVM, IOMMU,
> >   etc.) to register callbacks for different live update events:
> >     - LIVEUPDATE_PREPARE: Prepare for reboot (before blackout).
> >     - LIVEUPDATE_REBOOT: Final serialization before kexec (blackout).
> >     - LIVEUPDATE_FINISH: Cleanup after update (after blackout).
> >     - LIVEUPDATE_CANCEL: Rollback actions on failure or user request.
>
> I still don't think notifier chains are the right way to go about alot
> of this, most if it should be driven off of the file descriptors and
> fdbox, not through notification.
>
> At the very least we should not be adding notifier chains without a
> clear user of them, and I'm not convinced that the iommu driver or
> vfio are those users at the moment.
>
> I feel more like the iommu can be brought into the serialization
> indirectly by putting an iommufd into a fdbox.

We have identified the subsystems that need to participate in Live
Update: KVM, IOMMU, Devices, and Interrupts. We are planning to
present how each of them will integrate with the LUO.

> > - A sysfs interface (/sys/kernel/liveupdate/) for user-space control:
> >     - `prepare`: Initiate preparation (write 1) or reset (write 0).
> >     - `finish`: Finalize update in new kernel (write 1).
> >     - `cancel`: Abort ongoing preparation or reboot (write 1).
> >     - `reset`: Force state back to normal (write 1).
> >     - `state`: Read-only view of the current LUO state.
> >     - `enabled`: Read-only view of whether live update is enabled.

I forgot to update the commit message, there are no: enabled, reset,
and cancel files. We only have three files in LUO: `prepare`,
`finish`, and `prepare`

>
> I also think we should give up on the sysfs. If fdbox is going forward
> in a char dev direction then I think we should have two char devs
> /dev/kho/serialize and /dev/kho/deserialize and run the whole thing

KHO is a mechanism to preserve kernel memory across reboots. It can be
used independently of live update, for example, to preserve kexec
reboot telemetry, traces, and for other purposes. The LUO utilizes KHO
for memory preservation but also orchestrates specifically a live
update process, provides a generic way for subsystems and devices to
participate, handles error recovery, unclaimed devices, and other live
update-specific steps.

That said, I can transition the LUO interface from sysfs to a character device.

> through that. The concepts shown in the fdbox patches should be merged
> into the kho/serialize char dev as just a general architecture of open
> the char dev, put stuff into it, then finalize and do the kexec.

Some participating subsystems, such as interrupts, do not have a way
to export a file descriptor. It is unclear why we would require this
for kernel-internal state that needs to be preserved for live update,
which should instead register with internally.

> It gives you more options to avoid things like notifiers and a very
> clear "session" linked to a FD lifetime that encloses the
> serialization effort. I think that will make error case cleanup easier
> and the whole thing more maintainable. IMHO sysfs is not a great API
> choice for something so complicated.

IMO, the current API and state machine are quite simple (I plan to
present and go through them at one of the Hypervisor Live Update
meetings). However, I am open to changing to a different API, and we
can expose it through a character device.

> Also agree with Greg, I think this needs more thoughtful patch staging
> with actual complete solutions. I think focusing on a progression of
> demonstrable kexec preservation:
>  - A simple KVM and the VM's backing memory in a memfd is perserved
>  - A simple vfio-noiommu doing DMA to a preserved memfd, including not
>    resetting the device (but with no iommu driver)
>  - iommufd

We are working on this. However, each component builds upon the
previous one, so it makes sense to discuss the lower layers early to
get early feedback.

Pasha