Re: [PATCH 2/7] KVM: lapic: Delay 1ns at a time when waiting for timer to "expire"

Liran Alon <liran.alon@xxxxxxxxxx> · Tue, 16 Apr 2019 14:04:26 +0300

> On 16 Apr 2019, at 14:02, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote:
> 
> On 12/04/19 22:18, Sean Christopherson wrote:
>> To minimize the latency of timer interrupts as observed by the guest,
>> KVM adjusts the values it programs into the host timers to account for
>> the host's overhead of programming and handling the timer event.  In
>> the event that the adjustments are too aggressive, i.e. the timer fires
>> earlier than the guest expects, KVM busy waits immediately prior to
>> entering the guest.
>> 
>> Currently, KVM manually converts the delay from nanoseconds to clock
>> cycles.  But, the conversion is done in the guest's time domain, while
>> the delay occurs in the host's time domain, i.e. the delay may not be
>> accurate and could wait too little or too long.  Sidestep the headache
>> of shifting time domains by delaying 1ns at a time and breaking the loop
>> when the guest's desired timer delay has been satisfied.  Because the
>> advancement, which caps the delay to avoid unbounded busy waiting, is
>> stored in nanoseconds, the current advancement time can simply be used
>> as a loop terminator since we're delaying 1ns at a time (plus the few
>> cycles of overhead for running the code).
> 
> A 1 nanosecond advance (3-5 clock cycles) is even shorter than the time
> taken to execute kvm_read_l1_tsc.  I would just replace it with
> cpu_relax(); I can do the change when applying.
> 
> Paolo

Paolo, there is also another approach Sean and I were discussing.
I think it is more elegant.
See previous comments in this thread.

-Liran

> 
>> Cc: Liran Alon <liran.alon@xxxxxxxxxx>
>> Cc: Wanpeng Li <wanpengli@xxxxxxxxxxx>
>> Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
>> ---
>> arch/x86/kvm/lapic.c | 13 +++++++------
>> 1 file changed, 7 insertions(+), 6 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 92446cba9b24..e797e3145a8b 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -1486,7 +1486,8 @@ static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
>> void wait_lapic_expire(struct kvm_vcpu *vcpu)
>> {
>> 	struct kvm_lapic *apic = vcpu->arch.apic;
>> -	u64 guest_tsc, tsc_deadline, ns;
>> +	u32 timer_advance_ns = lapic_timer_advance_ns;
>> +	u64 guest_tsc, tmp_tsc, tsc_deadline, ns;
>> 
>> 	if (!lapic_in_kernel(vcpu))
>> 		return;
>> @@ -1499,13 +1500,13 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
>> 
>> 	tsc_deadline = apic->lapic_timer.expired_tscdeadline;
>> 	apic->lapic_timer.expired_tscdeadline = 0;
>> -	guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
>> +	tmp_tsc = guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
>> 	trace_kvm_wait_lapic_expire(vcpu->vcpu_id, guest_tsc - tsc_deadline);
>> 
>> -	/* __delay is delay_tsc whenever the hardware has TSC, thus always.  */
>> -	if (guest_tsc < tsc_deadline)
>> -		__delay(min(tsc_deadline - guest_tsc,
>> -			nsec_to_cycles(vcpu, lapic_timer_advance_ns)));
>> +	for (ns = 0; tmp_tsc < tsc_deadline && ns < timer_advance_ns; ns++) {
>> +		ndelay(1);
>> +		tmp_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
>> +	}
>> 
>> 	if (!lapic_timer_advance_adjust_done) {
>> 		/* too early */
>> 
>