On 20/05/19 10:18, Wanpeng Li wrote: > Advance lapic timer tries to hidden the hypervisor overhead between the > host emulated timer fires and the guest awares the timer is fired. However, > it just hidden the time between apic_timer_fn/handle_preemption_timer -> > wait_lapic_expire, instead of the real position of vmentry which is > mentioned in the orignial commit d0659d946be0 ("KVM: x86: add option to > advance tscdeadline hrtimer expiration"). There is 700+ cpu cycles between > the end of wait_lapic_expire and before world switch on my haswell desktop. > > This patchset tries to narrow the last gap(wait_lapic_expire -> world switch), > it takes the real overhead time between apic_timer_fn/handle_preemption_timer > and before world switch into consideration when adaptively tuning timer > advancement. The patchset can reduce 40% latency (~1600+ cycles to ~1000+ > cycles on a haswell desktop) for kvm-unit-tests/tscdeadline_latency when > testing busy waits. > > v3 -> v4: > * create timer_advance_ns debugfs entry iff lapic_in_kernel() > * keep if (guest_tsc < tsc_deadline) before the call to __wait_lapic_expire() > > v2 -> v3: > * expose 'kvm_timer.timer_advance_ns' to userspace > * move the tracepoint below guest_exit_irqoff() > * move wait_lapic_expire() before flushing the L1 > > v1 -> v2: > * fix indent in patch 1/4 > * remove the wait_lapic_expire() tracepoint and expose by debugfs > * move the call to wait_lapic_expire() into vmx.c and svm.c > > Wanpeng Li (5): > KVM: LAPIC: Extract adaptive tune timer advancement logic > KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow > KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace > KVM: LAPIC: Delay trace advance expire delta > KVM: LAPIC: Optimize timer latency further > > arch/x86/kvm/debugfs.c | 18 +++++++++++++++ > arch/x86/kvm/lapic.c | 60 +++++++++++++++++++++++++++++--------------------- > arch/x86/kvm/lapic.h | 3 ++- > arch/x86/kvm/svm.c | 4 ++++ > arch/x86/kvm/vmx/vmx.c | 4 ++++ > arch/x86/kvm/x86.c | 9 ++++---- > 6 files changed, 68 insertions(+), 30 deletions(-) > Queued, thanks (2-3 for 5.2, the rest for 5.3). Paolo