On Fri, 2022-12-30 at 08:25 +0000, scalingtree wrote: > Hi lists, > > (Re-sending as plain text.) > > We are in the process of using an external tool (CRIU) to > checkpoint/restore a KVM-enabled virtual machine. Initially we target > the hypervisor kvmtool but the extension, if done well, should allow > to checkpoint any hypervisor: like Qemu or firecracker. > > CRIU can checkpoint and restore most of the application (or the VMM > in our case) state except the state of the kernel module KVM. To > overcome this limitation, we need more getters in the KVM API to > extract the state of the VM. > > One example of a missing getter is the one for the guest memory. > There is a KVM_SET_MEMORY API call. But there is no equivalent > getter: KVM_GET_MEMORY. > > Can we add such getters to the KVM API? Any idea of the difficulty? I > think one of the difficulties will be to get the state of the > architecture-specific state of KVM: for now, we are targetting Intel > x86_64 architecture (VT-X). I'm not really sure I understand the use case here. Can't the VMM be restarted and restore this? Live update is a barely-special case of live migration. You kexec the underlying kernel and start a *new* VMM (which may have its own fixes too), from the preserved "migration" state. Any VMM which supports live migration surely doesn't need the kernel to help it with checkpoint/restore? Now... if you wanted to talk about leaving some of the physical CPUs in guest mode *while* the kernel uses one of them to actually do the kexec, *that* would be interesting. It starts with virtual address space isolation, putting that kvm_run loop into its own address space separate from the kernel. And then why *can't* we leave it running? If it ever needs to take a vmexit (and with interrupt posting and all the stuff that we not accelerate in hardware, how often is that anyway?), then it might need to wait for a Linux kernel to come back before it thunks back into it. That's the naïve starting point.... lots of fun with reconstituting state and reconciling "newly-created" KVMs in the new kernel with the vCPUs which are already actually running. But there's an amazing win to be had there, letting VMs continue to actually *run* while the whole hypervisor restarts with basically zero downtime.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature