On 31 October 2023 10:42:42 GMT, Paul Durrant <xadimgnik@xxxxxxxxx> wrote: >There is no documented ordering requirement on setting KVM_XEN_VCPU_ATTR_TYPE_TIMER versus KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO or KVM_XEN_ATTR_TYPE_SHARED_INFO but kvm_xen_start_timer() now needs the vCPU's pvclock to be valid. Should actually starting the timer not be deferred until then? (Or simply add a check here and have the attribute setting fail if the pvclock is not valid). There are no such dependencies and I don't want there to be. That would be the *epitome* of what my "if it needs documenting, fix it first" mantra is intended to correct. The fact that this broke on migration because the hv_clock isn't set up yet, as we saw in our overnight testing, is just a bug. In my tree I've fixed it thus: index 63531173dad1..e3d2d63eef34 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -182,7 +182,7 @@ static void kvm_xen_start_timer(st ruct kvm_vcpu *vcpu, u64 guest_abs, * the absolute CLOCK_MONOTONIC time at which the timer should * fire. */ - if (vcpu->kvm->arch.use_master_clock && + if (vcpu->arch.hv_clock.version && vcpu->kvm-> arch.use_master_clock && static_cpu_has(X86_FEATURE_CONSTANT_TSC)) { uint64_t host_tsc, guest_tsc; @@ -206,9 +206,23 @@ static void kvm_xen_start_timer(s truct kvm_vcpu *vcpu, u64 guest_abs, /* Calculate the guest kvmclock as the guest would do it. */ guest_tsc = kvm_read_l1_tsc(vcpu, host _tsc); - guest_now = __pvclock_read_cycles(&vcp u->arch.hv_clock, guest_tsc); + guest_now = __pvclock_read_cycles(&vcp u->arch.hv_clock, + gues t_tsc); } else { - /* Without CONSTANT_TSC, get_kvmclock_ ns() is the only option */ + /* + * Without CONSTANT_TSC, get_kvmclock_ ns() is the only option. + * + * Also if the guest PV clock hasn't b een set up yet, as is + * likely to be the case during migrat ion when the vCPU has + * not been run yet. It would be possi ble to calculate the + * scaling factors properly in that ca se but there's not much + * point in doing so. The get_kvmclock _ns() drift accumulates + * over time, so it's OK to use it at startup. Besides, on + * migration there's going to be a lit tle bit of skew in the + * precise moment at which timers fire anyway. Often they'll + * be in the "past" by the time the VM is running again after + * migration. + */ guest_now = get_kvmclock_ns(vcpu->kvm) ; kernel_now = ktime_get(); } -- 2.41.0 We *could* reset the timer when the vCPU starts to run and handles the KVM_REQ_CLOCK_UPDATE event, but I don't want to for two reasons. Firstly, we just don't need that complexity. This approach is OK, as the newly-added comment says. And we do need to fix get_kvmclock_ns() anyway, so it should work fine. Most of this patch will still be useful as it uses a single TSC read and we *do* need to do that part even after all the kvmclock brokenness is fixed. But the complexity on KVM_REQ_CLOCK_UPDATE isn't needed in the long term. Secondly, it's also wrong thing to do in the general case. Let's say KVM does its thing and snaps the kvmclock backwards in time on a KVM_REQ_CLOCK_UPDATE... do we really want to reinterpret existing timers against the new kvmclock? They were best left alone, I think.