On Thu, Dec 11, 2014 at 01:16:52PM -0800, Andy Lutomirski wrote: > On Thu, Dec 11, 2014 at 1:10 PM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > > > > > On 11/12/2014 21:48, Andy Lutomirski wrote: > >> On 12/10/2014 07:07 PM, Marcelo Tosatti wrote: > >>> On Thu, Dec 11, 2014 at 12:37:57AM +0100, Paolo Bonzini wrote: > >>>> > >>>> > >>>> On 10/12/2014 21:57, Marcelo Tosatti wrote: > >>>>> For the hrtimer which emulates the tscdeadline timer in the guest, > >>>>> add an option to advance expiration, and busy spin on VM-entry waiting > >>>>> for the actual expiration time to elapse. > >>>>> > >>>>> This allows achieving low latencies in cyclictest (or any scenario > >>>>> which requires strict timing regarding timer expiration). > >>>>> > >>>>> Reduces cyclictest avg latency by 50%. > >>>>> > >>>>> Note: this option requires tuning to find the appropriate value > >>>>> for a particular hardware/guest combination. One method is to measure the > >>>>> average delay between apic_timer_fn and VM-entry. > >>>>> Another method is to start with 1000ns, and increase the value > >>>>> in say 500ns increments until avg cyclictest numbers stop decreasing. > >>>> > >>>> What values are you using in practice for the parameter? > >>> > >>> 7us. > >> > >> It takes 7us to get from TSC deadline expiration to the *start* of > >> vmresume? That seems rather extreme. > > > > No, to the end. 7us is 21000 clock cycles, and the vmexit+vmentry alone > > costs about 1300. > > > > I suspect that something's massively wrong with context switching, > then -- it deserves to be considerably faster than that. The > architecturally expensive bits are vmresume, interrupt delivery, and > iret, but iret is only ~300 cycles and interrupt delivery should be > under 1k cycles. > > Throw in a few hundred more cycles for whatever wrmsr idiocy is going > on somewhere in the process, and we're still nowhere near 21k cycles. <idle>-0 [003] d..h2.. 1991756745496752: apic_timer_fn <-__run_hrtimer <idle>-0 [003] dN.h2.. 1991756745498732: tick_program_event <-hrtimer_interrupt <idle>-0 [003] d...3.. 1991756745502112: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=qemu-system-x86 next_pid=20114 next_prio=98 <idle>-0 [003] d...2.. 1991756745502592: __context_tracking_task_switch <-__schedule qemu-system-x86-20114 [003] ....1.. 1991756745503916: kvm_arch_vcpu_load <-kvm_sched_in qemu-system-x86-20114 [003] ....... 1991756745505320: kvm_cpu_has_pending_timer <-kvm_vcpu_block qemu-system-x86-20114 [003] ....... 1991756745506260: kvm_cpu_has_pending_timer <-kvm_arch_vcpu_ioctl_run qemu-system-x86-20114 [003] ....... 1991756745507812: kvm_apic_accept_events <-kvm_arch_vcpu_ioctl_run qemu-system-x86-20114 [003] ....... 1991756745508100: kvm_cpu_has_pending_timer <-kvm_arch_vcpu_ioctl_run qemu-system-x86-20114 [003] ....... 1991756745508872: kvm_apic_accept_events <-vcpu_enter_guest qemu-system-x86-20114 [003] ....1.. 1991756745510040: vmx_save_host_state <-vcpu_enter_guest qemu-system-x86-20114 [003] d...2.. 1991756745511876: kvm_entry: vcpu 1 1991756745511876 - 1991756745496752 = 15124 The timestamps are TSC reads. This is patched to run without ksoftirqd. Consider: The LAPIC is programmed to the next earliest event by hrtimer_interrupt. VM-entry is processing KVM_REQ_DEACTIVATE_FPU, KVM_REQ_EVENT. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html