On 10/06/21 22:39, Jim Mattson wrote:
But, even worse, it can modify guest memory,
even while all vCPU threads are stopped!
To some extent this is a userspace issue---they could declare vCPU
threads stopped only after KVM_GET_MPSTATE is done, and only start the
downtime phase of migration after that. But it is nevertheless a pretty
bad excuse.
Was this simply to avoid serializing the two new bits in
apic->pending_events?
Yes, and more precisely to allow some interoperability between old and
new kernels.
Would anyone object to serializing KVM_APIC_SIPI in the
kvm_vcpu_events struct as well? Or would it be better to resurrect
KVM_MP_STATE_SIPI_RECEIVED?
Reusing KVM_VCPUEVENT_VALID_SIPI_VECTOR to mean KVM_APIC_SIPI is set
would be nice, but it would break new->old migration (because old
kernels only set KVM_APIC_SIPI if they see KVM_MP_STATE_SIPI_RECEIVED).
Can we decide this migration corner case is not something we care about?
Using KVM_MP_STATE_SIPI_RECEIVED solves interoperability issues because
we never deleted the pre-2013 code, on the other hand
KVM_MP_STATE_SIPI_RECEIVED assumes the existing mpstate is
KVM_MP_STATE_INIT_RECEIVED; it does not account very well for the case
of INIT+SIPI both being pending. Unlike real hardware, KVM will queue a
SIPI if it comes before the INIT has been processed, so that even in
overcommit scenarios it is not possible to fail the INIT-SIPI; dropping
kvm_apic_accept_events altogether would break this "tweak" across
migration, which might cause failure to bring up APs.
If we start with just removing guest memory writes, there is an easy way
out: the tweak does not work in guest mode, where INIT causes a vmexit
and a pre-queued SIPI will never be delivered. So, if in guest mode, we
can ignore the case of pending INIT+SIPI, and only do a minimal version
of kvm_apic_accept_events() that delays the larger side effects to the
next KVM_RUN.
For a first improvement, the logic becomes the following:
* if in guest mode and both INIT and SIPI are set, clear KVM_APIC_SIPI
and exit. KVM_GET_VCPU_EVENTS will set latched_init.
* if in guest mode and SIPI is set, KVM_APIC_SIPI is transmitted to
userspace via KVM_MP_STATE_SIPI_RECEIVED and
KVM_VCPUEVENT_VALID_SIPI_VECTOR.
* if not in guest mode, run kvm_apic_accept_events. Later this can be
changed to drop KVM_APIC_SIPI if it's deemed preferrable.
I'll post a few RFC patches. Selftests would be needed too.
Paolo