Re: [RFC PATCH V3 0/5] Utilizing VMX preemption for timer virtualization

yunhong jiang <yunhong.jiang@xxxxxxxxxxxxxxx> · Mon, 13 Jun 2016 13:56:11 -0700

On Wed, 8 Jun 2016 12:18:57 +0800
Wanpeng Li <kernellwp@xxxxxxxxx> wrote:

> 2016-06-04 8:42 GMT+08:00 Yunhong Jiang
> <yunhong.jiang@xxxxxxxxxxxxxxx>:
> > The VMX-preemption timer is a feature on VMX, it counts down, from
> > the value loaded by VM entry, in VMX nonroot operation. When the
> > timer counts down to zero, it stops counting down and a VM exit
> > occurs.
> >
> > This patchset utilize VMX preemption timer for tsc deadline timer
> > virtualization. The VMX preemption timer is armed before the
> > vm-entry if the tsc deadline timer is enabled. A VMExit will happen
> > if the virtual TSC deadline timer expires.
> >
> > When the vCPU thread is blocked because of HLT instruction, the tsc
> > deadline timer virtualization will be switched to use the current
> > solution, i.e. use the timer for it. It's switched back to VMX
> > preemption timer when the vCPU thread is unblocked.
> >
> > This solution replace the complex OS's hrtimer system, and also the
> > host timer interrupt handling cost, with a preemption_timer VMexit.
> > It fits well for some NFV usage scenario, when the vCPU is bound to
> > a pCPU and the pCPU is isolated, or some similar scenarioes.
> >
> > It adds a little bit latency for each VM-entry because we need
> > setup the preemption timer each time.
> >
> > Signed-off-by: Yunhong Jiang <yunhong.jiang@xxxxxxxxx>
> >
> > Performance Evalaution:
> > Host:
> > [nfv@otcnfv02 ~]$ cat /proc/cpuinfo
> > ....
> > cpu family      : 6
> > model           : 63
> > model name      : Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
> >
> > Guest:
> > Two vCPU with vCPU pinned to isolated pCPUs, idle=poll on guest
> > kernel. When the vCPU is not pinned, the benefit is smaller than
> > pinned situation.
> >
> > Test tools:
> > cyclictest [1] running 10 minutes with 1ms interval, i.e. 600000
> > loop in total.
> >
> > 1. enable_hv_timer=Y.
> >
> > # Histogram
> > ......
> > 000003 000000
> > 000004 002174
> > 000005 042961
> > 000006 479383
> > 000007 071123
> > 000008 003720
> > 000009 000467
> > 000010 000078
> > 000011 000011
> > 000012 000009
> > ......
> > # Min Latencies: 00004
> > # Avg Latencies: 00007
> >
> > 2. enable_hv_timer=N.
> >
> > # Histogram
> > ......
> > 000003 000000
> > 000004 000000
> > 000005 000042
> > 000006 000772
> > 000007 008262
> > 000008 200759
> > 000009 381126
> > 000010 008056
> > 000011 000234
> > 000012 000367
> > ......
> > # Min Latencies: 00005
> > # Avg Latencies: 00010
> >
> 
> I sometimes observed that cyclictest avg overflow in guest.
> 
> policy: other/other: loadavg: 0.79 0.19 0.06 2/355 1872
> 999847623940096 policy: other/other: loadavg: 0.79 0.19 0.06 1/349
> 1883          629164130618368 T: 0 ( 1838) P: 0 I:1000 C:   5092
> Min:      8 Act: -750 Avg:8495211086576766976 T: 0 ( 1838) P: 0
> I:1000 C:   6934 Min:      8 Act: -878 Avg:-9223372036854775808
> Max:      -3
> 
> Host:
> 
>  grep HZ /boot/config-`uname -r`
> CONFIG_NO_HZ_COMMON=y
> # CONFIG_HZ_PERIODIC is not set
> CONFIG_NO_HZ_IDLE=y
> # CONFIG_NO_HZ_FULL is not set
> # CONFIG_NO_HZ is not set
> # CONFIG_HZ_100 is not set
> # CONFIG_HZ_250 is not set
> # CONFIG_HZ_300 is not set
> CONFIG_HZ_1000=y
> CONFIG_HZ=1000
> CONFIG_MACHZ_WDT=m
> 
> Guest(3.5, other kernel versions also can reproduce it):
> 
> grep HZ /boot/config-`uname -r`
> CONFIG_NO_HZ=y
> CONFIG_RCU_FAST_NO_HZ=y
> # CONFIG_HZ_100 is not set
> CONFIG_HZ_250=y
> # CONFIG_HZ_300 is not set
> # CONFIG_HZ_1000 is not set
> CONFIG_HZ=250
> CONFIG_MACHZ_WDT=m
> 
> Anyone meet such things? Any tips to solve it?

Sorry for slow response. Did you try with this patchset or not? Does it happen w/o this patch?

If there are so overflow, it usually mean the guest timer is not accurate. I can try to check if any hints if you can provide your host information and your qemu parameter.

Thanks
--jyh

> 
> Regards,
> Wanpeng Li

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html