To minimize the latency of timer interrupts as observed by the guest, KVM adjusts the values it programs into the host timers to account for the host's overhead of programming and handling the timer event. In the event that the adjustments are too aggressive, i.e. the timer fires earlier than the guest expects, KVM busy waits immediately prior to entering the guest. Currently, KVM manually converts the delay from nanoseconds to clock cycles. But, the conversion is done in the guest's time domain, while the delay occurs in the host's time domain, i.e. the delay may not be accurate and could wait too little or too long. Sidestep the headache of shifting time domains by delaying 1ns at a time and breaking the loop when the guest's desired timer delay has been satisfied. Because the advancement, which caps the delay to avoid unbounded busy waiting, is stored in nanoseconds, the current advancement time can simply be used as a loop terminator since we're delaying 1ns at a time (plus the few cycles of overhead for running the code). Cc: Liran Alon <liran.alon@xxxxxxxxxx> Cc: Wanpeng Li <wanpengli@xxxxxxxxxxx> Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> --- arch/x86/kvm/lapic.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 92446cba9b24..e797e3145a8b 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1486,7 +1486,8 @@ static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu) void wait_lapic_expire(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic = vcpu->arch.apic; - u64 guest_tsc, tsc_deadline, ns; + u32 timer_advance_ns = lapic_timer_advance_ns; + u64 guest_tsc, tmp_tsc, tsc_deadline, ns; if (!lapic_in_kernel(vcpu)) return; @@ -1499,13 +1500,13 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu) tsc_deadline = apic->lapic_timer.expired_tscdeadline; apic->lapic_timer.expired_tscdeadline = 0; - guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc()); + tmp_tsc = guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc()); trace_kvm_wait_lapic_expire(vcpu->vcpu_id, guest_tsc - tsc_deadline); - /* __delay is delay_tsc whenever the hardware has TSC, thus always. */ - if (guest_tsc < tsc_deadline) - __delay(min(tsc_deadline - guest_tsc, - nsec_to_cycles(vcpu, lapic_timer_advance_ns))); + for (ns = 0; tmp_tsc < tsc_deadline && ns < timer_advance_ns; ns++) { + ndelay(1); + tmp_tsc = kvm_read_l1_tsc(vcpu, rdtsc()); + } if (!lapic_timer_advance_adjust_done) { /* too early */ -- 2.21.0