Re: [RFC PATCH V3 0/5] Utilizing VMX preemption for timer virtualization

Wanpeng Li <kernellwp@xxxxxxxxx> · Wed, 8 Jun 2016 12:18:57 +0800

2016-06-04 8:42 GMT+08:00 Yunhong Jiang <yunhong.jiang@xxxxxxxxxxxxxxx>:
> The VMX-preemption timer is a feature on VMX, it counts down, from the
> value loaded by VM entry, in VMX nonroot operation. When the timer
> counts down to zero, it stops counting down and a VM exit occurs.
>
> This patchset utilize VMX preemption timer for tsc deadline timer
> virtualization. The VMX preemption timer is armed before the vm-entry if the
> tsc deadline timer is enabled. A VMExit will happen if the virtual TSC
> deadline timer expires.
>
> When the vCPU thread is blocked because of HLT instruction, the tsc deadline
> timer virtualization will be switched to use the current solution, i.e. use
> the timer for it. It's switched back to VMX preemption timer when the vCPU
> thread is unblocked.
>
> This solution replace the complex OS's hrtimer system, and also the host
> timer interrupt handling cost, with a preemption_timer VMexit. It fits well
> for some NFV usage scenario, when the vCPU is bound to a pCPU and the pCPU
> is isolated, or some similar scenarioes.
>
> It adds a little bit latency for each VM-entry because we need setup the
> preemption timer each time.
>
> Signed-off-by: Yunhong Jiang <yunhong.jiang@xxxxxxxxx>
>
> Performance Evalaution:
> Host:
> [nfv@otcnfv02 ~]$ cat /proc/cpuinfo
> ....
> cpu family      : 6
> model           : 63
> model name      : Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
>
> Guest:
> Two vCPU with vCPU pinned to isolated pCPUs, idle=poll on guest kernel.
> When the vCPU is not pinned, the benefit is smaller than pinned situation.
>
> Test tools:
> cyclictest [1] running 10 minutes with 1ms interval, i.e. 600000 loop in
> total.
>
> 1. enable_hv_timer=Y.
>
> # Histogram
> ......
> 000003 000000
> 000004 002174
> 000005 042961
> 000006 479383
> 000007 071123
> 000008 003720
> 000009 000467
> 000010 000078
> 000011 000011
> 000012 000009
> ......
> # Min Latencies: 00004
> # Avg Latencies: 00007
>
> 2. enable_hv_timer=N.
>
> # Histogram
> ......
> 000003 000000
> 000004 000000
> 000005 000042
> 000006 000772
> 000007 008262
> 000008 200759
> 000009 381126
> 000010 008056
> 000011 000234
> 000012 000367
> ......
> # Min Latencies: 00005
> # Avg Latencies: 00010
>

I sometimes observed that cyclictest avg overflow in guest.

policy: other/other: loadavg: 0.79 0.19 0.06 2/355 1872          999847623940096
policy: other/other: loadavg: 0.79 0.19 0.06 1/349 1883          629164130618368
T: 0 ( 1838) P: 0 I:1000 C:   5092 Min:      8 Act: -750 Avg:8495211086576766976
T: 0 ( 1838) P: 0 I:1000 C:   6934 Min:      8 Act: -878
Avg:-9223372036854775808 Max:      -3

Host:

 grep HZ /boot/config-`uname -r`
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_MACHZ_WDT=m

Guest(3.5, other kernel versions also can reproduce it):

grep HZ /boot/config-`uname -r`
CONFIG_NO_HZ=y
CONFIG_RCU_FAST_NO_HZ=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_MACHZ_WDT=m

Anyone meet such things? Any tips to solve it?

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html