Re: checkpoint/restore: Adding more "Getters" to the KVM API

David Woodhouse <dwmw2@xxxxxxxxxxxxx> · Fri, 13 Jan 2023 11:14:27 +0000

On Fri, 2022-12-30 at 08:25 +0000, scalingtree wrote:
> Hi lists,
> 
> (Re-sending as plain text.)
> 
> We are in the process of using an external tool (CRIU) to
> checkpoint/restore a KVM-enabled virtual machine. Initially we target
> the hypervisor kvmtool but the extension, if done well, should allow
> to checkpoint any hypervisor: like Qemu or firecracker.
> 
> CRIU can checkpoint and restore most of the application (or the VMM
> in our case) state except the state of the kernel module KVM. To
> overcome this limitation, we need more getters in the KVM API to
> extract the state of the VM.
> 
> One example of a missing getter is the one for the guest memory.
> There is a KVM_SET_MEMORY API call. But there is no equivalent
> getter: KVM_GET_MEMORY. 
> 
> Can we add such getters to the KVM API? Any idea of the difficulty? I
> think one of the difficulties will be to get the state of the
> architecture-specific state of KVM: for now, we are targetting Intel
> x86_64 architecture (VT-X).

I'm not really sure I understand the use case here. Can't the VMM be
restarted and restore this?

Live update is a barely-special case of live migration. You kexec the
underlying kernel and start a *new* VMM (which may have its own fixes
too), from the preserved "migration" state.

Any VMM which supports live migration surely doesn't need the kernel to
help it with checkpoint/restore?

Now... if you wanted to talk about leaving some of the physical CPUs in
guest mode *while* the kernel uses one of them to actually do the
kexec, *that* would be interesting.

It starts with virtual address space isolation, putting that kvm_run
loop into its own address space separate from the kernel. And then why
*can't* we leave it running? If it ever needs to take a vmexit (and
with interrupt posting and all the stuff that we not accelerate in
hardware, how often is that anyway?), then it might need to wait for a
Linux kernel to come back before it thunks back into it.

That's the naïve starting point.... lots of fun with reconstituting
state and reconciling "newly-created" KVMs in the new kernel with the
vCPUs which are already actually running. But there's an amazing win to
be had there, letting VMs continue to actually *run* while the whole
hypervisor restarts with basically zero downtime.

Attachment:
smime.p7s

Description: S/MIME cryptographic signature