+Oliver On Fri, Jun 30, 2023, Like Xu wrote: > From: Like Xu <likexu@xxxxxxxxxxx> > > When a new vcpu is created and subsequently restored by vcpu snapshot, > apply kvm_vcpu_write_tsc_offset() before vcpu runs for the first time. > > Before a vcpu runs for the first time, the user space (VMM) sets the guest > tsc as it wants, which may triggers the time synchronization mechanism with > other vcpus (if any). In a scenario where a vcpu snapshot is used to > restore, like the bugzilla report [*], the newly target guest tsc (e.g. > at the time of vcpu restoration) is synchronized with its the most > primitive guest timestamp initialized at the time of vcpu creation. > > Furthermore, the VMM can actually update the target guest tsc multiple > times before the vcpu actually gets running, which requires the tsc_offset > to be updated every time it is set. In this scenario, it can be considered > as unstable tsc (even this vcpu has not yet even started ticking to follow > the intended logic of KVM timer emulation). > > It is only necessary to delay this step until kvm_arch_vcpu_load() to > catch up with guest expectation with the help of kvm_vcpu_has_run(), > and the change is expected to not break any of the cumbersome existing > virt timer features. "expected to not break" and "does not break" are two different statements. > Reported-by: Yong He <alexyonghe@xxxxxxxxxxx> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217423 [*] > Tested-by: Jinrong Liang <cloudliang@xxxxxxxxxxx> > Signed-off-by: Like Xu <likexu@xxxxxxxxxxx> > --- > arch/x86/kvm/x86.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 439312e04384..616940fc3791 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -4818,7 +4818,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) > if (tsc_delta < 0) > mark_tsc_unstable("KVM discovered backwards TSC"); > > - if (kvm_check_tsc_unstable()) { > + if (kvm_check_tsc_unstable() || !kvm_vcpu_has_run(vcpu)) { > u64 offset = kvm_compute_l1_tsc_offset(vcpu, > vcpu->arch.last_guest_tsc); > kvm_vcpu_write_tsc_offset(vcpu, offset); Doing this on every vCPU load feels all kinds of wrong, e.g. it will override the value set by userspace via KVM_VCPU_TSC_OFFSET. One could argue the KVM is "helping" userspace by providing a more up-to-date offset for the guest, but "helping" userspace by silently overriding userspace rarely ends well. Can't we instead just fix the heuristic that tries to detect synchronization? --- arch/x86/kvm/x86.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c30364152fe6..43d40f058a41 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2721,14 +2721,14 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) * kvm_clock stable after CPU hotplug */ synchronizing = true; - } else { + } else if (kvm_vcpu_has_run(vcpu)) { u64 tsc_exp = kvm->arch.last_tsc_write + nsec_to_cycles(vcpu, elapsed); u64 tsc_hz = vcpu->arch.virtual_tsc_khz * 1000LL; /* * Special case: TSC write with a small delta (1 second) - * of virtual cycle time against real time is - * interpreted as an attempt to synchronize the CPU. + * of virtual cycle time against real time on a running + * vCPU is interpreted as an attempt to synchronize. */ synchronizing = data < tsc_exp + tsc_hz && data + tsc_hz > tsc_exp; base-commit: 2d6f036579d4ef5a09b0b45f66e34406290dfa1e --