On 31/10/2023 11:42, David Woodhouse wrote:
On 31 October 2023 10:42:42 GMT, Paul Durrant <xadimgnik@xxxxxxxxx> wrote:
There is no documented ordering requirement on setting KVM_XEN_VCPU_ATTR_TYPE_TIMER versus KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO or KVM_XEN_ATTR_TYPE_SHARED_INFO but kvm_xen_start_timer() now needs the vCPU's pvclock to be valid. Should actually starting the timer not be deferred until then? (Or simply add a check here and have the attribute setting fail if the pvclock is not valid).
There are no such dependencies and I don't want there to be. That would be the *epitome* of what my "if it needs documenting, fix it first" mantra is intended to correct.
The fact that this broke on migration because the hv_clock isn't set up yet, as we saw in our overnight testing, is just a bug. In my tree I've fixed it thus:
index 63531173dad1..e3d2d63eef34 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -182,7 +182,7 @@ static void kvm_xen_start_timer(st
ruct kvm_vcpu *vcpu, u64 guest_abs,
* the absolute CLOCK_MONOTONIC time at which
the timer should
* fire.
*/
- if (vcpu->kvm->arch.use_master_clock &&
+ if (vcpu->arch.hv_clock.version && vcpu->kvm->
arch.use_master_clock &&
static_cpu_has(X86_FEATURE_CONSTANT_TSC))
{
uint64_t host_tsc, guest_tsc;
@@ -206,9 +206,23 @@ static void kvm_xen_start_timer(s
truct kvm_vcpu *vcpu, u64 guest_abs,
/* Calculate the guest kvmclock as the
guest would do it. */
guest_tsc = kvm_read_l1_tsc(vcpu, host
_tsc);
- guest_now = __pvclock_read_cycles(&vcp
u->arch.hv_clock, guest_tsc);
+ guest_now = __pvclock_read_cycles(&vcp
u->arch.hv_clock,
+ gues
t_tsc);
} else {
- /* Without CONSTANT_TSC, get_kvmclock_
ns() is the only option */
+ /*
+ * Without CONSTANT_TSC, get_kvmclock_
ns() is the only option.
+ *
+ * Also if the guest PV clock hasn't b
een set up yet, as is
+ * likely to be the case during migrat
ion when the vCPU has
+ * not been run yet. It would be possi
ble to calculate the
+ * scaling factors properly in that ca
se but there's not much
+ * point in doing so. The get_kvmclock
_ns() drift accumulates
+ * over time, so it's OK to use it at
startup. Besides, on
+ * migration there's going to be a lit
tle bit of skew in the
+ * precise moment at which timers fire
anyway. Often they'll
+ * be in the "past" by the time the VM
is running again after
+ * migration.
+ */
guest_now = get_kvmclock_ns(vcpu->kvm)
;
kernel_now = ktime_get();
}
--
2.41.0
We *could* reset the timer when the vCPU starts to run and handles the KVM_REQ_CLOCK_UPDATE event, but I don't want to for two reasons.
Firstly, we just don't need that complexity. This approach is OK, as the newly-added comment says. And we do need to fix get_kvmclock_ns() anyway, so it should work fine. Most of this patch will still be useful as it uses a single TSC read and we *do* need to do that part even after all the kvmclock brokenness is fixed. But the complexity on KVM_REQ_CLOCK_UPDATE isn't needed in the long term.
Secondly, it's also wrong thing to do in the general case. Let's say KVM does its thing and snaps the kvmclock backwards in time on a KVM_REQ_CLOCK_UPDATE... do we really want to reinterpret existing timers against the new kvmclock? They were best left alone, I think.
Do we not want to do exactly that? If the master clock is changed, why
would we not want to re-interpret the guest's idea of time? That update
will be visible to the guest when it re-reads the PV clock source.
Paul