Lock Holder Preemption (LHP) is classic problem especially on VM, if lock holder vCPU is preempted on host, other vCPUs will do busy looping and waste pCPU time. And there is no hw Pause Loop Exiting (PLE) supported on LoongArch system also. Here pavavirt qspinlock is introduced, by the kernel compiling test, it improves performance greatly if pVCPU is shared by multiple vCPUs. The testbed is on 3C5000 Dual-way machine with 32 cores and 2 numa nodes, test case is kcbench on kernel mainline 5.10, the detailed command is "kcbench --src /root/src/linux" Performance on host machine kernel compile time performance impact Original 150.29 seconds With patch 150.20 seconds almost no impact Performance on virtual machine: 1. 1 VM with 32 vCPUs and 2 numa node kernel compile time performance impact Original 173.07 seconds With patch 171.73 seconds +1% 2. 2 VMs with 32 vCPUs and 2 numa node kernel compile time performance impact Original 2362.04 seconds With patch 354.17 seconds +566% Bibo Mao (2): LoongArch: KVM: Add paravirt qspinlock in kvm side LoongArch: KVM: Add paravirt qspinlock in guest side arch/loongarch/Kconfig | 14 +++ arch/loongarch/include/asm/Kbuild | 1 - arch/loongarch/include/asm/kvm_host.h | 4 + arch/loongarch/include/asm/kvm_para.h | 1 + arch/loongarch/include/asm/loongarch.h | 1 + arch/loongarch/include/asm/paravirt.h | 47 ++++++++++ arch/loongarch/include/asm/qspinlock.h | 39 ++++++++ .../include/asm/qspinlock_paravirt.h | 6 ++ arch/loongarch/kernel/paravirt.c | 88 +++++++++++++++++++ arch/loongarch/kernel/smp.c | 4 +- arch/loongarch/kvm/exit.c | 24 ++++- arch/loongarch/kvm/vcpu.c | 13 ++- 12 files changed, 238 insertions(+), 4 deletions(-) create mode 100644 arch/loongarch/include/asm/qspinlock.h create mode 100644 arch/loongarch/include/asm/qspinlock_paravirt.h base-commit: 7846b618e0a4c3e08888099d1d4512722b39ca99 -- 2.39.3