On 08.08.2017 13:49, Longpeng (Mike) wrote: > > > On 2017/8/8 19:25, David Hildenbrand wrote: > >> On 08.08.2017 06:05, Longpeng(Mike) wrote: >>> This is a simple optimization for kvm_vcpu_on_spin, the >>> main idea is described in patch-1's commit msg. >>> >>> I did some tests base on the RFC version, the result shows >>> that it can improves the performance slightly. >>> >>> == Geekbench-3.4.1 == >>> VM1: 8U,4G, vcpu(0...7) is 1:1 pinned to pcpu(6...11,18,19) >>> running Geekbench-3.4.1 *10 truns* >>> VM2/VM3/VM4: configure is the same as VM1 >>> stress each vcpu usage(seed by top in guest) to 40% >>> >>> The comparison of each testcase's score: >>> (higher is better) >>> before after improve >>> Inter >>> single 1176.7 1179.0 0.2% >>> multi 3459.5 3426.5 -0.9% >>> Float >>> single 1150.5 1150.9 0.0% >>> multi 3364.5 3391.9 0.8% >>> Memory(stream) >>> single 1768.7 1773.1 0.2% >>> multi 2511.6 2557.2 1.8% >>> Overall >>> single 1284.2 1286.2 0.2% >>> multi 3231.4 3238.4 0.2% >>> >>> >>> == kernbench-0.42 == >>> VM1: 8U,12G, vcpu(0...7) is 1:1 pinned to pcpu(6...11,18,19) >>> running "kernbench -n 10" >>> VM2/VM3/VM4: configure is the same as VM1 >>> stress each vcpu usage(seed by top in guest) to 40% >>> >>> The comparison of 'Elapsed Time': >>> (sooner is better) >>> before after improve >>> load -j4 12.762 12.751 0.1% >>> load -j32 9.743 8.955 8.1% >>> load -j 9.688 9.229 4.7% >>> >>> >>> Physical Machine: >>> Architecture: x86_64 >>> CPU op-mode(s): 32-bit, 64-bit >>> Byte Order: Little Endian >>> CPU(s): 24 >>> On-line CPU(s) list: 0-23 >>> Thread(s) per core: 2 >>> Core(s) per socket: 6 >>> Socket(s): 2 >>> NUMA node(s): 2 >>> Vendor ID: GenuineIntel >>> CPU family: 6 >>> Model: 45 >>> Model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz >>> Stepping: 7 >>> CPU MHz: 2799.902 >>> BogoMIPS: 5004.67 >>> Virtualization: VT-x >>> L1d cache: 32K >>> L1i cache: 32K >>> L2 cache: 256K >>> L3 cache: 15360K >>> NUMA node0 CPU(s): 0-5,12-17 >>> NUMA node1 CPU(s): 6-11,18-23 >>> >>> --- >>> Changes since V1: >>> - split the implementation of s390 & arm. [David] >>> - refactor the impls according to the suggestion. [Paolo] >>> >>> Changes since RFC: >>> - only cache result for X86. [David & Cornlia & Paolo] >>> - add performance numbers. [David] >>> - impls arm/s390. [Christoffer & David] >>> - refactor the impls. [me] >>> >>> --- >>> Longpeng(Mike) (4): >>> KVM: add spinlock optimization framework >>> KVM: X86: implement the logic for spinlock optimization >>> KVM: s390: implements the kvm_arch_vcpu_in_kernel() >>> KVM: arm: implements the kvm_arch_vcpu_in_kernel() >>> >>> arch/arm/kvm/handle_exit.c | 2 +- >>> arch/arm64/kvm/handle_exit.c | 2 +- >>> arch/mips/kvm/mips.c | 6 ++++++ >>> arch/powerpc/kvm/powerpc.c | 6 ++++++ >>> arch/s390/kvm/diag.c | 2 +- >>> arch/s390/kvm/kvm-s390.c | 6 ++++++ >>> arch/x86/include/asm/kvm_host.h | 5 +++++ >>> arch/x86/kvm/hyperv.c | 2 +- >>> arch/x86/kvm/svm.c | 10 +++++++++- >>> arch/x86/kvm/vmx.c | 16 +++++++++++++++- >>> arch/x86/kvm/x86.c | 11 +++++++++++ >>> include/linux/kvm_host.h | 3 ++- >>> virt/kvm/arm/arm.c | 5 +++++ >>> virt/kvm/kvm_main.c | 4 +++- >>> 14 files changed, 72 insertions(+), 8 deletions(-) >>> >> >> I am curious, is there any architecture that allows to trigger >> kvm_vcpu_on_spin(vcpu); while _not_ in kernel mode? > > > IIUC, X86/SVM will trap to host due to PAUSE insn no matter the vcpu is in > kernel-mode or user-mode. > >> >> I would have guessed that user space should never be allowed to make cpu >> wide decisions (giving up the CPU to the hypervisor). >> >> E.g. s390x diag can only be executed from kernel space. VMX PAUSE is >> only valid from kernel space. > > > X86/VMX has "PAUSE exiting" and "PAUSE-loop exiting"(PLE). KVM only uses PLE, > this is as you said "only valid from kernel space" > > However, the "PAUSE exiting" can cause user-mode vcpu exit too. Thanks Longpeng and Christoffer! -- Thanks, David