On Wed, 8 Jun 2016 12:18:57 +0800 Wanpeng Li <kernellwp@xxxxxxxxx> wrote: > 2016-06-04 8:42 GMT+08:00 Yunhong Jiang > <yunhong.jiang@xxxxxxxxxxxxxxx>: > > The VMX-preemption timer is a feature on VMX, it counts down, from > > the value loaded by VM entry, in VMX nonroot operation. When the > > timer counts down to zero, it stops counting down and a VM exit > > occurs. > > > > This patchset utilize VMX preemption timer for tsc deadline timer > > virtualization. The VMX preemption timer is armed before the > > vm-entry if the tsc deadline timer is enabled. A VMExit will happen > > if the virtual TSC deadline timer expires. > > > > When the vCPU thread is blocked because of HLT instruction, the tsc > > deadline timer virtualization will be switched to use the current > > solution, i.e. use the timer for it. It's switched back to VMX > > preemption timer when the vCPU thread is unblocked. > > > > This solution replace the complex OS's hrtimer system, and also the > > host timer interrupt handling cost, with a preemption_timer VMexit. > > It fits well for some NFV usage scenario, when the vCPU is bound to > > a pCPU and the pCPU is isolated, or some similar scenarioes. > > > > It adds a little bit latency for each VM-entry because we need > > setup the preemption timer each time. > > > > Signed-off-by: Yunhong Jiang <yunhong.jiang@xxxxxxxxx> > > > > Performance Evalaution: > > Host: > > [nfv@otcnfv02 ~]$ cat /proc/cpuinfo > > .... > > cpu family : 6 > > model : 63 > > model name : Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz > > > > Guest: > > Two vCPU with vCPU pinned to isolated pCPUs, idle=poll on guest > > kernel. When the vCPU is not pinned, the benefit is smaller than > > pinned situation. > > > > Test tools: > > cyclictest [1] running 10 minutes with 1ms interval, i.e. 600000 > > loop in total. > > > > 1. enable_hv_timer=Y. > > > > # Histogram > > ...... > > 000003 000000 > > 000004 002174 > > 000005 042961 > > 000006 479383 > > 000007 071123 > > 000008 003720 > > 000009 000467 > > 000010 000078 > > 000011 000011 > > 000012 000009 > > ...... > > # Min Latencies: 00004 > > # Avg Latencies: 00007 > > > > 2. enable_hv_timer=N. > > > > # Histogram > > ...... > > 000003 000000 > > 000004 000000 > > 000005 000042 > > 000006 000772 > > 000007 008262 > > 000008 200759 > > 000009 381126 > > 000010 008056 > > 000011 000234 > > 000012 000367 > > ...... > > # Min Latencies: 00005 > > # Avg Latencies: 00010 > > > > I sometimes observed that cyclictest avg overflow in guest. > > policy: other/other: loadavg: 0.79 0.19 0.06 2/355 1872 > 999847623940096 policy: other/other: loadavg: 0.79 0.19 0.06 1/349 > 1883 629164130618368 T: 0 ( 1838) P: 0 I:1000 C: 5092 > Min: 8 Act: -750 Avg:8495211086576766976 T: 0 ( 1838) P: 0 > I:1000 C: 6934 Min: 8 Act: -878 Avg:-9223372036854775808 > Max: -3 > > Host: > > grep HZ /boot/config-`uname -r` > CONFIG_NO_HZ_COMMON=y > # CONFIG_HZ_PERIODIC is not set > CONFIG_NO_HZ_IDLE=y > # CONFIG_NO_HZ_FULL is not set > # CONFIG_NO_HZ is not set > # CONFIG_HZ_100 is not set > # CONFIG_HZ_250 is not set > # CONFIG_HZ_300 is not set > CONFIG_HZ_1000=y > CONFIG_HZ=1000 > CONFIG_MACHZ_WDT=m > > Guest(3.5, other kernel versions also can reproduce it): > > grep HZ /boot/config-`uname -r` > CONFIG_NO_HZ=y > CONFIG_RCU_FAST_NO_HZ=y > # CONFIG_HZ_100 is not set > CONFIG_HZ_250=y > # CONFIG_HZ_300 is not set > # CONFIG_HZ_1000 is not set > CONFIG_HZ=250 > CONFIG_MACHZ_WDT=m > > Anyone meet such things? Any tips to solve it? Sorry for slow response. Did you try with this patchset or not? Does it happen w/o this patch? If there are so overflow, it usually mean the guest timer is not accurate. I can try to check if any hints if you can provide your host information and your qemu parameter. Thanks --jyh > > Regards, > Wanpeng Li -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html