[PATCH 2/7] KVM: lapic: Delay 1ns at a time when waiting for timer to "expire"

Sean Christopherson <sean.j.christopherson@xxxxxxxxx> · Fri, 12 Apr 2019 13:18:29 -0700

To minimize the latency of timer interrupts as observed by the guest,
KVM adjusts the values it programs into the host timers to account for
the host's overhead of programming and handling the timer event.  In
the event that the adjustments are too aggressive, i.e. the timer fires
earlier than the guest expects, KVM busy waits immediately prior to
entering the guest.

Currently, KVM manually converts the delay from nanoseconds to clock
cycles.  But, the conversion is done in the guest's time domain, while
the delay occurs in the host's time domain, i.e. the delay may not be
accurate and could wait too little or too long.  Sidestep the headache
of shifting time domains by delaying 1ns at a time and breaking the loop
when the guest's desired timer delay has been satisfied.  Because the
advancement, which caps the delay to avoid unbounded busy waiting, is
stored in nanoseconds, the current advancement time can simply be used
as a loop terminator since we're delaying 1ns at a time (plus the few
cycles of overhead for running the code).

Cc: Liran Alon <liran.alon@xxxxxxxxxx>
Cc: Wanpeng Li <wanpengli@xxxxxxxxxxx>
Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
---
 arch/x86/kvm/lapic.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 92446cba9b24..e797e3145a8b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1486,7 +1486,8 @@ static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
 void wait_lapic_expire(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
-	u64 guest_tsc, tsc_deadline, ns;
+	u32 timer_advance_ns = lapic_timer_advance_ns;
+	u64 guest_tsc, tmp_tsc, tsc_deadline, ns;
 
 	if (!lapic_in_kernel(vcpu))
 		return;
@@ -1499,13 +1500,13 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
 
 	tsc_deadline = apic->lapic_timer.expired_tscdeadline;
 	apic->lapic_timer.expired_tscdeadline = 0;
-	guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
+	tmp_tsc = guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
 	trace_kvm_wait_lapic_expire(vcpu->vcpu_id, guest_tsc - tsc_deadline);
 
-	/* __delay is delay_tsc whenever the hardware has TSC, thus always.  */
-	if (guest_tsc < tsc_deadline)
-		__delay(min(tsc_deadline - guest_tsc,
-			nsec_to_cycles(vcpu, lapic_timer_advance_ns)));
+	for (ns = 0; tmp_tsc < tsc_deadline && ns < timer_advance_ns; ns++) {
+		ndelay(1);
+		tmp_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
+	}
 
 	if (!lapic_timer_advance_adjust_done) {
 		/* too early */
-- 
2.21.0