> On 12 Apr 2019, at 23:18, Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > To minimize the latency of timer interrupts as observed by the guest, > KVM adjusts the values it programs into the host timers to account for > the host's overhead of programming and handling the timer event. In > the event that the adjustments are too aggressive, i.e. the timer fires > earlier than the guest expects, KVM busy waits immediately prior to > entering the guest. > > Currently, KVM manually converts the delay from nanoseconds to clock > cycles. But, the conversion is done in the guest's time domain, while > the delay occurs in the host's time domain, i.e. the delay may not be > accurate and could wait too little or too long. Just to make sure I understand the issue you see here: The __delay() is called with a parameter which the number of guest cycles required to get from current guest cycle (guest_tsc) to deadline cycle (tsc_deadline). But in case guest vCPU frequency is different than physical CPU frequency, this parameter should further be converted from guest cycles to host cycles. Do I understand correctly? If yes, I think you should elaborate this part a bit more in commit message. As this is quite confusing. :) > Sidestep the headache > of shifting time domains by delaying 1ns at a time and breaking the loop > when the guest's desired timer delay has been satisfied. Because the > advancement, which caps the delay to avoid unbounded busy waiting, is > stored in nanoseconds, the current advancement time can simply be used > as a loop terminator since we're delaying 1ns at a time (plus the few > cycles of overhead for running the code). Looking at start_sw_tscdeadline(), we have the same issue where we require to convert guest cycles to host ns time in order to set hrtimer expiration time. This is simply done by: host_ns = guest_cycles * 1000000ULL; do_div(host_ns, vcpu->arch.virtual_tsc_khz); Why shouldn’t we use the same approach for constructing __delay() parameter from guest_cycles currently pass to it as parameter? i.e. something such as: delay_guest_cycles = min(tsc_deadline - guest_tsc, nsec_to_cycles(vcpu, lapic_timer_advance_ns)); delay_host_ns = delay_guest_cycles * 1000000ULL; do_div(delay_host_ns, vcpu->arch.virtual_tsc_khz); __delay(delay_host_ns); -Liran > > Cc: Liran Alon <liran.alon@xxxxxxxxxx> > Cc: Wanpeng Li <wanpengli@xxxxxxxxxxx> > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > --- > arch/x86/kvm/lapic.c | 13 +++++++------ > 1 file changed, 7 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 92446cba9b24..e797e3145a8b 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -1486,7 +1486,8 @@ static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu) > void wait_lapic_expire(struct kvm_vcpu *vcpu) > { > struct kvm_lapic *apic = vcpu->arch.apic; > - u64 guest_tsc, tsc_deadline, ns; > + u32 timer_advance_ns = lapic_timer_advance_ns; > + u64 guest_tsc, tmp_tsc, tsc_deadline, ns; Nit: I would rename guest_tsc and tmp_tsc to guest_tsc_on_exit and current_guest_tsc respectively. > > if (!lapic_in_kernel(vcpu)) > return; > @@ -1499,13 +1500,13 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu) > > tsc_deadline = apic->lapic_timer.expired_tscdeadline; > apic->lapic_timer.expired_tscdeadline = 0; > - guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc()); > + tmp_tsc = guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc()); > trace_kvm_wait_lapic_expire(vcpu->vcpu_id, guest_tsc - tsc_deadline); > > - /* __delay is delay_tsc whenever the hardware has TSC, thus always. */ > - if (guest_tsc < tsc_deadline) > - __delay(min(tsc_deadline - guest_tsc, > - nsec_to_cycles(vcpu, lapic_timer_advance_ns))); > + for (ns = 0; tmp_tsc < tsc_deadline && ns < timer_advance_ns; ns++) { > + ndelay(1); > + tmp_tsc = kvm_read_l1_tsc(vcpu, rdtsc()); > + } > > if (!lapic_timer_advance_adjust_done) { > /* too early */ > -- > 2.21.0 >