Hi everybody, Here are the notes from the inaugural Hypervisor Live Update call that happened on Monday, January 27. Thanks for everybody who was involved! ----->o----- I talked about the logistics and goals of the biweekly. If you would like to be added to the calendar invite, please email me privately. I can also share our cover letter and shared drive with you that contains recordings and slides (if any) if you provide an email address associated with a Google account. We also discussed the scope of the biweekly series to include: - KHO + Including potential early adopters (hugetlbfs, tmpfs) - Persistence of PCIe devices + IOMMU(fd) persistence - Guest memory + Including Confidential Computing use cases - Reboot optimizations ----->o----- Mike is planning on sending out another KHO patch series, likely this week (v4). ----->o----- James discussed work last year on iommufd persistence and hooking iommu drivers into KHO to persistent their state into kexec. Feedback suggested setting up new page tables and then transferring over after the kexec is completed. James will start implementing this on top of his existing patch series and include qemu changes as well. To minimize downtime, the plan was to resume the VM with the old page tables. He suggested userspace would initiate the switch to the new page tables. Jason noted KHO has been too focused on preserving memory and needs to preserve file descriptors, we need to take iommufd, freeze it, give it to KHO, and then pick it back up after kexec. When you're done with it after the hand over, like an atomic attach, then it gets destroyed. Jason also noted we'll have to consider preserving vIOMMU state to support latest NV hardware, which is highly complex. Alexander Graf previously developed a concept called fdbox that turned out to be very intrusive in the kernel. Jason noted that all of this work will be invasive, but we should prefer to compartmentalize it as much as possible (like for iommufd stuff, a kho.c). ----->o----- Pasha suggested KHO should be kept as a mechanism to preserve kernel memory across kexec, the serialization requires different mechanisms. He plans to propose separate an API for callbacks into drivers. Jason noted it was going to be critical to provide a state machine that we all agree on, including for definitions. One aspect he would like to align on is whether you could put a guest_memfd into an fdbox or even a tmpfs into an fdbox. Mike Rapoport noted there are multiple layers here, where KHO is very lower level and fdbox is built on top of it. Mike emphasized it will be critical to establish a format between multiple kernel versions that will be standardized. ----->o----- There was lots of discussion on stable ABIs for allowing continuous upgrading of kernels without requiring a reboot. Jason suggested upstream can provide a mechanism for upgrading from 6.12 -> 6.13 but not 6.16 as an example. Doing any version -> any version is much harder and likely cannot be supported, at least in the short term, because it's so invasive. A good example would be for the mlx driver. James and Alexander noted that we must be able to rollback and this can be enabled by the downstream customer, it may not be a burden on the upstream kernel. There was a general acknowledgment that upstream pairs must be supported, but much of this could become the responsibility of the downstream user. (Alexander noted some users may care about mlx, others may not, for example.) David Woodhouse inquired about rollback functionality and how we would support a VM that has deserialized after kexec using a new feature and then still support a downgrade afterwards. Alexander said it was important that the user of KHO supports very controlled A->B environments for this to work properly, and, if provided, they can control downgrade paths as well. Dave Hansen noted this was similar to discussions about checkpoint restart and CRIU. The burden in this case may be very similar, that it is taken upon by those who care about upgrading from one version to another and that it is not a general upstream requirement. It was acknowledged that this will be a ton of work to maintain reliably, however. Dave noted it will be important to socialize the work that needs to be done with upstream developers, but that the work will be taken on those who care to use KHO. It was agreed on that once you roll out, you enable new features only when you are confident there will not be a rollback, and then once the feature is enabled you've passed the point of no return. ----->o----- Jason provided a nice early milestone for KHO work: demonstrate a kexec while the VM survives and the VFIO attached to it survives. Pasha noted this has been done before with PKRAM, but needs to now be done in a way that KHO would support. ----->o----- Next meeting is scheduled for Monday, February 10 at 8am PST (UTC-8). I'll send a reminder on this mailing list. Topics I think we should cover in the next meeting: - LSF/MM/BPF topics of interest for the group - v4 of the KHO patch series sent out by Mike Rapoport - iommufd patch series (as well as qemu) sent out (hopefully) by James Gowans this week, otherwise a week or two from now - establishing an API for callbacks into drivers to serialize state - establishing an FSM for all of the various states that are agreed upon with common language - finalizing the decision on upstream support for minor version upgrades across KHO and the burden of downstream users to define what versions can be upgraded - topics proposed by Pasha: reducing blackout window, relaxed serialization, KHO activation requirements, and decoupling KHO from kexec - implications of preserving vIOMMU state Please let me know if you'd like to propose additional topics for discussion, thank you!