On Wed, Dec 17, 2014 at 03:58:13PM +0100, Radim Krcmar wrote: > 2014-12-16 09:08-0500, Marcelo Tosatti: > > For the hrtimer which emulates the tscdeadline timer in the guest, > > add an option to advance expiration, and busy spin on VM-entry waiting > > for the actual expiration time to elapse. > > > > This allows achieving low latencies in cyclictest (or any scenario > > which requires strict timing regarding timer expiration). > > > > Reduces average cyclictest latency from 12us to 8us > > on Core i5 desktop. > > > > Note: this option requires tuning to find the appropriate value > > for a particular hardware/guest combination. One method is to measure the > > average delay between apic_timer_fn and VM-entry. > > Another method is to start with 1000ns, and increase the value > > in say 500ns increments until avg cyclictest numbers stop decreasing. > > > > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx> > > Reviewed-by: Radim Krčmář <rkrcmar@xxxxxxxxxx> > > > +++ kvm/arch/x86/kvm/lapic.c > > @@ -1087,11 +1089,64 @@ static void apic_timer_expired(struct kv > [...] > > +/* > > + * On APICv, this test will cause a busy wait > > + * during a higher-priority task. > > + */ > > (A bit confusing ... this test doesn't busy wait.) > > > + > > +static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu) > [...] > > +void wait_lapic_expire(struct kvm_vcpu *vcpu) > > +{ > [...] > > + tsc_deadline = apic->lapic_timer.expired_tscdeadline; > > + apic->lapic_timer.expired_tscdeadline = 0; > > + guest_tsc = kvm_x86_ops->read_l1_tsc(vcpu, native_read_tsc()); > > + > > + while (guest_tsc < tsc_deadline) { > > + int delay = min(tsc_deadline - guest_tsc, 1000ULL); > > Why break the __delay() loop into smaller parts? So that you can handle interrupts, in case this code ever moves outside IRQ protected region. > > + __delay(delay); > > (Does not have to call delay_tsc, but I guess it won't change.) > > > + guest_tsc = kvm_x86_ops->read_l1_tsc(vcpu, native_read_tsc()); > > + } > > } > > > > Btw. simple automatic delta tuning had worse results? Haven't tried automatic tuning. So what happens on a realtime environment is this: you execute the fixed number of instructions from interrupt handling all the way to VM-entry. Well, almost fixed. Fixed is the number of apic_timer_fn plus KVM instructions. You can also execute host scheduler and timekeeping processing. In practice, the length to execute that instruction sequence is a bell shaped normal distribution around the average (the right side is slightly higher due to host scheduler and timekeeping processing). You want to advance the timer by the rightmost bucket, that way you guarantee lower possible latencies (which is the interest here). That said, i don't see advantage in automatic tuning for the usecase which this targets. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html