The VMX-preemption timer is a feature on VMX, it counts down, from the value loaded by VM entry, in VMX nonroot operation. When the timer counts down to zero, it stops counting down and a VM exit occurs. This patchset utilize VMX preemption timer for tsc deadline timer virtualization. The VMX preemption timer is armed before the vm-entry if the tsc deadline timer is enabled. A VMExit will happen if the virtual TSC deadline timer expires. When the vCPU thread is scheduled out, the tsc deadline timer virtualization will be switched to use the current solution, i.e. use the timer for it. It's switched back to VMX preemption timer when the vCPU thread is scheduled int. This solution replace the complex OS's hrtimer system, and also the host timer interrupt handling cost, with a preemption_timer VMexit. It fits well for some NFV usage scenario, when the vCPU is bound to a pCPU and the pCPU is isolated, or some similar scenarioes. However, it possibly has impact if the vCPU thread is scheduled in/out very frequently, because it switches from/to the hrtimer emulation a lot. A module parameter is provided to turn it on or off. Signed-off-by: Yunhong Jiang <yunhong.jiang@xxxxxxxxx> Performance Evalaution: Host: [nfv@otcnfv02 ~]$ cat /proc/cpuinfo .... cpu family : 6 model : 63 model name : Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz Guest: Two vCPU with vCPU pinned to isolated pCPUs, idle=poll on guest kernel. When the vCPU is not pinned, the benefit is smaller than pinned situation. Test tools: cyclictest [1] running 10 minutes with 1ms interval, i.e. 600000 loop in total. 1. enable_hv_timer=Y. # Histogram ...... 000003 000000 000004 000029 000005 023017 000006 357485 000007 192723 000008 026141 000009 000106 000010 000067 ...... # Min Latencies: 00004 # Avg Latencies: 00006 2. enable_hv_timer=N. # Histogram ...... 000004 000000 000005 000074 000006 001943 000007 005820 000008 164729 000009 424401 000010 001964 000011 000252 000012 000190 ...... # Min Latencies: 00005 # Avg Latencies: 00010 Changes since v1 [2]: * Remove the vmx_sched_out and no changes to kvm_x86_ops for it. * Remove the two expired timer checkings on each vm-entry. * Rename the hwemul_timer to hv_timer * Clear vmx_x86_ops's membership if preemption timer is not usable. * Cache cpu_preemption_timer_multi. * Keep the tracepoint with the function patch. * Other minor changes based on Paolo's review. [1] https://rt.wiki.kernel.org/index.php/Cyclictest [2] http://www.spinics.net/lists/kvm/msg132895.html Yunhong Jiang (4): Add the kvm sched_out hook Utilize the vmx preemption timer Separate the start_sw_tscdeadline Utilize the vmx preemption timer for tsc deadline timer arch/arm/include/asm/kvm_host.h | 1 + arch/mips/include/asm/kvm_host.h | 1 + arch/powerpc/include/asm/kvm_host.h | 1 + arch/s390/include/asm/kvm_host.h | 1 + arch/x86/include/asm/kvm_host.h | 4 + arch/x86/kvm/lapic.c | 144 ++++++++++++++++++++++++++++++------ arch/x86/kvm/lapic.h | 11 +++ arch/x86/kvm/trace.h | 22 ++++++ arch/x86/kvm/vmx.c | 51 ++++++++++++- arch/x86/kvm/x86.c | 8 ++ include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 1 + 12 files changed, 221 insertions(+), 25 deletions(-) TODO: Find out the CPUs with VMX preemption timer broken. -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html