Re: KVM/arm64: Guest ABI changes do not appear rollback-safe

Oliver Upton <oupton@xxxxxxxxxx> · Tue, 25 Jan 2022 09:29:13 -0800

Hi Marc,

On Tue, Jan 25, 2022 at 12:46 AM Marc Zyngier <maz@xxxxxxxxxx> wrote:
> > If I understand correctly, the original motivation for going with
> > pseudo-registers was to comply with QEMU, which uses KVM_GET_REG_LIST
> > and KVM_[GET|SET]_ONE_REG interface, but I'm guessing the VMMs doing
> > save/restore across migration might write the same values for every
> > vCPU.
>
> KVM currently restricts the vcpu features to be unified across vcpus,
> but that's only an implementation choice.

But that implementation choice has become ABI, no? How could support
for asymmetry be added without requiring userspace opt-in or breaking
existing VMMs that depend on feature unification?

> The ARM architecture doesn't
> mandate that these registers are all the same, and it isn't impossible
> that we'd allow for the feature set to become per-vcpu at some point
> in time. So this argument doesn't really hold.

Accessing per-VM state N times is bound to increase VM blackout time
during migrations ~linearly as the number of vCPUs in a VM increases,
since a VM scoped lock is necessary to serialize guest accesses. It
could be tolerable at present scale, but seems like in the future it
could become a real problem.

> Furthermore, compatibility with QEMU's save/restore model is
> essential, and AFAICT, there is no open source alternative.

Agree fundamentally, but I believe it is entirely reasonable to
require a userspace change to adopt a new KVM feature. Otherwise, we
may be trying to shoehorn new features into existing UAPI that may not
be a precise fit..

In order to cure the serialization mentioned above, two options are
top of mind: accessing the VM state with the VM FD or informing
userspace that a set of registers need only be written once for an
entire VM. If we add support for asymmetry later down the road, that
would become an opt-in such that userspace will do the access
per-vCPU.

> A device means yet another configuration and migration API. Don't you
> think we have enough of those? The complexity of KVM/arm64 userspace
> API is already insane, and extremely fragile. Adding to it will be a
> validation nightmare (it already is, and I don't see anyone actively
> helping with it).

It seems equally fragile to introduce VM-wide serialization to vCPU
UAPI that we know is in the live migration critical path for _any_
VMM. Without requiring userspace changes for all the new widgets under
discussion we're effectively forcing VMMs to do something suboptimal.

--
Thanks,
Oliver