On Thu, Dec 11, 2014 at 1:10 PM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > > On 11/12/2014 21:48, Andy Lutomirski wrote: >> On 12/10/2014 07:07 PM, Marcelo Tosatti wrote: >>> On Thu, Dec 11, 2014 at 12:37:57AM +0100, Paolo Bonzini wrote: >>>> >>>> >>>> On 10/12/2014 21:57, Marcelo Tosatti wrote: >>>>> For the hrtimer which emulates the tscdeadline timer in the guest, >>>>> add an option to advance expiration, and busy spin on VM-entry waiting >>>>> for the actual expiration time to elapse. >>>>> >>>>> This allows achieving low latencies in cyclictest (or any scenario >>>>> which requires strict timing regarding timer expiration). >>>>> >>>>> Reduces cyclictest avg latency by 50%. >>>>> >>>>> Note: this option requires tuning to find the appropriate value >>>>> for a particular hardware/guest combination. One method is to measure the >>>>> average delay between apic_timer_fn and VM-entry. >>>>> Another method is to start with 1000ns, and increase the value >>>>> in say 500ns increments until avg cyclictest numbers stop decreasing. >>>> >>>> What values are you using in practice for the parameter? >>> >>> 7us. >> >> It takes 7us to get from TSC deadline expiration to the *start* of >> vmresume? That seems rather extreme. > > No, to the end. 7us is 21000 clock cycles, and the vmexit+vmentry alone > costs about 1300. > I suspect that something's massively wrong with context switching, then -- it deserves to be considerably faster than that. The architecturally expensive bits are vmresume, interrupt delivery, and iret, but iret is only ~300 cycles and interrupt delivery should be under 1k cycles. Throw in a few hundred more cycles for whatever wrmsr idiocy is going on somewhere in the process, and we're still nowhere near 21k cycles. >> Is it possible that almost all of that latency is from deadline >> expiration to C-state exit? > > No, I don't think so. Marcelo confirmed that C-states are disabled, bt > anyway none of the C-state latency matches Marcelo's data: C1 is really > small (1 us), C1e is too large (~10 us). > > To see the effect of C-state exit, go to the plots I made on a normal > laptop and see latency jumping up to 200000 or 400000 cycles > (respectively 70 and 140 us, corresponding to C3 and C6 latencies of 60 > and 80 us). > >> If so, can we teach the timer code to wake >> up early to account for that? > > What, it doesn't already do that? No clue. My one machine that actually cares about this has C states rather heavily tuned, so I wouldn't notice. --Andy > > Paolo -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html