On 7/19/2021 3:37 PM, Wanpeng Li wrote:
On Mon, 19 Jul 2021 at 15:26, Zeng Guang <guang.zeng@xxxxxxxxx> wrote:
On 7/16/2021 5:25 PM, Wanpeng Li wrote:
On Fri, 16 Jul 2021 at 15:14, Zeng Guang <guang.zeng@xxxxxxxxx> wrote:
Current IPI process in guest VM will virtualize the writing to interrupt
command register(ICR) of the local APIC which will cause VM-exit anyway
on source vCPU. Frequent VM-exit could induce much overhead accumulated
if running IPI intensive task.
IPI virtualization as a new VT-x feature targets to eliminate VM-exits
when issuing IPI on source vCPU. It introduces a new VM-execution
control - "IPI virtualization"(bit4) in the tertiary processor-based
VM-exection controls and a new data structure - "PID-pointer table
address" and "Last PID-pointer index" referenced by the VMCS. When "IPI
virtualization" is enabled, processor emulateds following kind of writes
to APIC registers that would send IPIs, moreover without causing VM-exits.
- Memory-mapped ICR writes
- MSR-mapped ICR writes
- SENDUIPI execution
This patch series implement IPI virtualization support in KVM.
Patches 1-3 add tertiary processor-based VM-execution support
framework.
Patch 4 implement interrupt dispatch support in x2APIC mode with
APIC-write VM exit. In previous platform, no CPU would produce
APIC-write VM exit with exit qulification 300H when the "virtual x2APIC
mode" VM-execution control was 1.
Patch 5 implement IPI virtualization related function including
feature enabling through tertiary processor-based VM-execution in
various scenario of VMCS configuration, PID table setup in vCPU creation
and vCPU block consideration.
Document for IPI virtualization is now available at the latest "Intel
Architecture Instruction Set Extensions Programming Reference".
Document Link:
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
We did experiment to measure average time sending IPI from source vCPU
to the target vCPU completing the IPI handling by kvm unittest w/ and
w/o IPI virtualization. When IPI virtualizatin enabled, it will reduce
22.21% and 15.98% cycles comsuming in xAPIC mode and x2APIC mode
respectly.
KMV unittest:vmexit/ipi, 2 vCPU, AP runs without halt to ensure no VM
exit impact on target vCPU.
Cycles of IPI
xAPIC mode x2APIC mode
test w/o IPIv w/ IPIv w/o IPIv w/ IPIv
1 6106 4816 4265 3768
2 6244 4656 4404 3546
3 6165 4658 4233 3474
4 5992 4710 4363 3430
5 6083 4741 4215 3551
6 6238 4904 4304 3547
7 6164 4617 4263 3709
8 5984 4763 4518 3779
9 5931 4712 4645 3667
10 5955 4530 4332 3724
11 5897 4673 4283 3569
12 6140 4794 4178 3598
13 6183 4728 4363 3628
14 5991 4994 4509 3842
15 5866 4665 4520 3739
16 6032 4654 4229 3701
17 6050 4653 4185 3726
18 6004 4792 4319 3746
19 5961 4626 4196 3392
20 6194 4576 4433 3760
Average cycles 6059 4713.1 4337.85 3644.8
%Reduction -22.21% -15.98%
Commit a9ab13ff6e (KVM: X86: Improve latency for single target IPI
fastpath) mentioned that the whole ipi fastpath feature reduces the
latency from 4238 to 3293 around 22.3% on SKX server, why your IPIv
hardware acceleration is worse than software emulation? In addition,
Actually this performance data was measured on the basis of fastpath
optimization while cpu runs at base frequency.
As a result, IPI virtualization could have extra 15.98% cost reduction
over IPI fastpath process in x2apic mode.
I observed that adaptive advance lapic timer and adaptive halt-polling
will influence kvm-unit-tests/vmexit.flat IPI testing score, could you
post the score after disabling these features as commit a9ab13ff6e
(KVM: X86: Improve latency for single target IPI fastpath) mentioned?
In addition, please post the hackbench(./hackbench -l 1000000) and ipi
microbenchmark scores.
We modified unittest to make AP runing with idle loop instead of hlt .
This eliminates the impact
from adaptive halt-polling. So far we don't observe the influence from
adaptive advance lapic timer
by test either. vmexit/ipi test should not involve lapic timer.
We post the hackbench and ipi microbenchmark score in patch V2 for your
reference.
Thanks.
Wanpeng