2016-06-04 8:42 GMT+08:00 Yunhong Jiang <yunhong.jiang@xxxxxxxxxxxxxxx>: > The VMX-preemption timer is a feature on VMX, it counts down, from the > value loaded by VM entry, in VMX nonroot operation. When the timer > counts down to zero, it stops counting down and a VM exit occurs. > > This patchset utilize VMX preemption timer for tsc deadline timer > virtualization. The VMX preemption timer is armed before the vm-entry if the > tsc deadline timer is enabled. A VMExit will happen if the virtual TSC > deadline timer expires. > > When the vCPU thread is blocked because of HLT instruction, the tsc deadline > timer virtualization will be switched to use the current solution, i.e. use > the timer for it. It's switched back to VMX preemption timer when the vCPU > thread is unblocked. > > This solution replace the complex OS's hrtimer system, and also the host > timer interrupt handling cost, with a preemption_timer VMexit. It fits well > for some NFV usage scenario, when the vCPU is bound to a pCPU and the pCPU > is isolated, or some similar scenarioes. > > It adds a little bit latency for each VM-entry because we need setup the > preemption timer each time. > > Signed-off-by: Yunhong Jiang <yunhong.jiang@xxxxxxxxx> > > Performance Evalaution: > Host: > [nfv@otcnfv02 ~]$ cat /proc/cpuinfo > .... > cpu family : 6 > model : 63 > model name : Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz > > Guest: > Two vCPU with vCPU pinned to isolated pCPUs, idle=poll on guest kernel. > When the vCPU is not pinned, the benefit is smaller than pinned situation. > > Test tools: > cyclictest [1] running 10 minutes with 1ms interval, i.e. 600000 loop in > total. > > 1. enable_hv_timer=Y. > > # Histogram > ...... > 000003 000000 > 000004 002174 > 000005 042961 > 000006 479383 > 000007 071123 > 000008 003720 > 000009 000467 > 000010 000078 > 000011 000011 > 000012 000009 > ...... > # Min Latencies: 00004 > # Avg Latencies: 00007 > > 2. enable_hv_timer=N. > > # Histogram > ...... > 000003 000000 > 000004 000000 > 000005 000042 > 000006 000772 > 000007 008262 > 000008 200759 > 000009 381126 > 000010 008056 > 000011 000234 > 000012 000367 > ...... > # Min Latencies: 00005 > # Avg Latencies: 00010 > I sometimes observed that cyclictest avg overflow in guest. policy: other/other: loadavg: 0.79 0.19 0.06 2/355 1872 999847623940096 policy: other/other: loadavg: 0.79 0.19 0.06 1/349 1883 629164130618368 T: 0 ( 1838) P: 0 I:1000 C: 5092 Min: 8 Act: -750 Avg:8495211086576766976 T: 0 ( 1838) P: 0 I:1000 C: 6934 Min: 8 Act: -878 Avg:-9223372036854775808 Max: -3 Host: grep HZ /boot/config-`uname -r` CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ_FULL is not set # CONFIG_NO_HZ is not set # CONFIG_HZ_100 is not set # CONFIG_HZ_250 is not set # CONFIG_HZ_300 is not set CONFIG_HZ_1000=y CONFIG_HZ=1000 CONFIG_MACHZ_WDT=m Guest(3.5, other kernel versions also can reproduce it): grep HZ /boot/config-`uname -r` CONFIG_NO_HZ=y CONFIG_RCU_FAST_NO_HZ=y # CONFIG_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_300 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=250 CONFIG_MACHZ_WDT=m Anyone meet such things? Any tips to solve it? Regards, Wanpeng Li -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html