[Hypervisor Live Update] Notes from February 10, 2025

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everybody,

Here are the notes from the last Hypervisor Live Update call that happened 
on Monday, February 10.  Thanks for everybody who was involved!

These notes are intended to bring people up to speed who could not attend 
the call as well as keep the conversation going in between meetings.

----->o-----
James mentioned guest memory persistence and the future of guestmemfs,
including feedback to allow for more prototyping.  We didn't get into
this topic during the call, so we'll touch on it in the next call.

----->o-----
Mike brought up the sysfs interface, whether or not we want an activate
or not, as well as aligning with the devicetree feedback that was
received on the last upstream posting for KHO.  Every new binding would
need to go through their code review and Jason noted the scalability and
flexibility concerns for this.

Jason noted that older kernels can ignore newer devicetree components and
everything still works, which is different in the cases of live update.
He suggested a much stronger compatibility for live update purposes
between pairs of two kernel versions.  Andrey agreed that we don't really
need devicetree here and pointed to a patch series he had developed that
doesn't rely on this.

Pasha agreed that for the short/medium term it may make sense to decouple
this from devicetree.

Alexander noted that FDT was chosen deliberately: we want a generic
key,value store and the ability to add attributes without invalidating
compatibility.  Andrey noted this can be done without devicetree.  There
was discussion on using this as a KHO-tree and not precisely a
devicetree.  Schema validation was another attractive characteristic of
FDT.

Alexander noted the current discussion has been focused on nodes and
sub-nodes as a structure based on runtime data: the decision had to be
made between structured data (incl debugability) and data that is always
in memory and gets translated from one kernel to another.  Jason thought
we needed both.

Pasha noted Intel in 2021 had preserved VFIO passthrough devices using
PKRAM.  The PKRAM patches and its interfaces turned out to be very
difficult to maintain, given PKRAM did not maintain ABIs between kernels.
It relied heavily on developer insight to specify what needed to be
maintained in yaml files.  Mike suggested that we'd need to be able to
allow drivers the flexibility to point to an area of memory, a struct, or
a scalar.

James discussed serializing all inodes in guestmemfs with KHO in previous
work, and this turned out to be very useful.  Newer kernels were able to
add new fields and move things around, but the downgrade path wasn't
supported using this.  Jason noted there were similarities between stable
ABIs provided to userspace and filesystems.  He stressed that complexity
here for driver maintainers may become too burdensome.

Jason suggested starting simple: structs pointing to structs pointing to
structs.  Have drivers that have versions 1, 2, etc, and allow for this
to become more complex when needed.  There was a general desire expressed
to not maintain the kernel direct map and the virtual address space would
end up getting scrambled, and that must be supported.

Alexander noted that KHO's strategy so far has been that FDT has been the
standard for compatibility and the usage of it versus other solutions
depends on the specific use cases.  We'd need to extend tooling to do
validation in the future.  Alexander noted that after writing 1 to the
activate file that you'll grab a snapshot of the device tree from sysfs
and that can be used for validation.

----->o-----
We shifted to discussion on KHO v5.  Mike noted that there was a sysfs
interface that enables KHO and then the KHO data (devicetree and scratch
description) gets appended to kexec images.  Only the scratch space would
be touched by the new kernel, not all memory is preserved, although it is
in the kernel direct map.

There was a discussion about using a bitmap (or an idr) to indicate what
memory should be removed from the buddy allocator during early boot.
Alexander noted you'd still need to be able to associate that bitmap or
data structure with the specific driver that needs to find its memory.
Jason noted this would be the driver's responsibility.

Jason stressed how this would be used to establish the ABI, for example
if a driver does alloc_pages(), store memory, and then use to_kho(), this
is a nice clean interface to preserve driver memory.  Doing things like
GFP_KHO would be more invasive for this.

----->o-----
Pasha led a discussion on the next KHO series to be sent upstream and
alignment between people in the call.

Pasha suggest we don't want to have kexec file load as part of the KHO
process and rather these should be decoupled from each other.  We want to
minimize the blackout window as much as possible.  If the VM is still
running while doing KHO activate, we'd need to prevent any operation from
changing this state that limits the VM functionality.  Pasha wanted kexec
file load to be completely decoupled from KHO.

Alexander noted the point of the activate phase is to accelerate the
kexec so that we can serialize state, goal being to keep 99% of VM
operations still possible.  Pasha noted that some devices need to be
preserved across kexec but others do not need to.  Jason suggested not
coupling this with a global activate state, in that case, and Alexander
agreed on allowing certain drivers to participate and not necessarily
all.

Jason stressed that we need to all agree on the state machine given the
discussion two weeks ago.

----->o-----
Next meeting is scheduled for Monday, February 24 at 8am PST (UTC-8).
I'll send a reminder on this mailing list.

Topics I think we should cover in the next meeting:

 - the future of guestmemfs and what it becomes, including alignment so
   prototyping can be done
 - Andrey's patch series that didn't rely on devicetree
 - alignment on not preserving the kernel direct map and using different
   virtual addresses in the new kernel
 - v5 of the KHO patch series with minor fixes
 - establishing an FSM for all of the various states that are agreed upon
   with common language (when memory mappings can happen, what is
   disallowed at certain stages)
 - extending the above topic on a separate FSM for the entire live update
   process (what happens in brownout, blackout, shutdown, etc)
 - iommufd patch series (as well as qemu) from James
 - establishing an API for callbacks into drivers to serialize state
   during brownout
 - topics proposed by Pasha: reducing blackout window, relaxed
   serialization, KHO activation requirements, and decoupling KHO from
   kexec
 - implications of preserving vIOMMU state

Please let me know if you'd like to propose additional topics for
discussion, thank you!




[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux