On 2023.07.07 14:38, Wang Jianchao wrote: > > Hi > > This patchset attemps to introduce a new pv feature, lazy tscdeadline. > Everytime guest write msr of MSR_IA32_TSC_DEADLINE, a vm-exit occurs > and host side handle it. However, a lot of the vm-exit is unnecessary > because the timer is often over-written before it expires. > > v : write to msr of tsc deadline > | : timer armed by tsc deadline > > v v v v v | | | | | > ---------------------------------------> Time > > The timer armed by msr write is over-written before expires and the > vm-exit caused by it are wasted. The lazy tscdeadline works as following, > > v v v v v | | > ---------------------------------------> Time > '- arm -' > > The 1st timer is responsible for arming the next timer. When the armed > timer is expired, it will check pending and arm a new timer. > > In the netperf test with TCP_RR on loopback, this lazy_tscdeadline can > reduce vm-exit obviously. > > Close Open > -------------------------------------------------------- > VM-Exit > sum 12617503 5815737 > intr 0% 37023 0% 33002 > cpuid 0% 1 0% 0 > halt 19% 2503932 47% 2780683 > msr-write 79% 10046340 51% 2966824 > pause 0% 90 0% 84 > ept-violation 0% 584 0% 336 > ept-misconfig 0% 0 0% 2 > preemption-timer 0% 29518 0% 34800 > ------------------------------------------------------- > MSR-Write > sum 10046455 2966864 > apic-icr 25% 2533498 93% 2781235 > tsc-deadline 74% 7512945 6% 185629 > There has not been any patches on qemu side, I open this feature with a debug patch as attachment. It is to make the test more convenient which can open the feature w/o involving the qemu. If you want to test this feature, it may help. echo 1 > /proc/sys/kernel/apic_lazy_tsc_deadline Since it is just for testing, the serializing is not so exact. Please use it w/o any running guests ;) Thanks Jianchao > This patchset is made and tested on 6.4.0, includes 3 patches, > > The 1st one adds necessary data structures for this feature > The 2nd one adds the specific msr operations between guest and host > The 3rd one are the one make this feature works. > > Any comment is welcome. > > Thanks > Jianchao > > Wang Jianchao (3) > KVM: x86: add msr register and data structure for lazy tscdeadline > KVM: x86: exchange info about lazy_tscdeadline with msr > KVM: X86: add lazy tscdeadline support to reduce vm-exit of msr-write > > > arch/x86/include/asm/kvm_host.h | 10 ++++++++ > arch/x86/include/uapi/asm/kvm_para.h | 9 +++++++ > arch/x86/kernel/apic/apic.c | 47 ++++++++++++++++++++++++++++++++++- > arch/x86/kernel/kvm.c | 13 ++++++++++ > arch/x86/kvm/cpuid.c | 1 + > arch/x86/kvm/lapic.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------ > arch/x86/kvm/lapic.h | 4 +++ > arch/x86/kvm/x86.c | 26 ++++++++++++++++++++ > 8 files changed, 229 insertions(+), 9 deletions(-) >
>From 85b0ba7dc42be36f03ae3783d4e4b23cd96bbed8 Mon Sep 17 00:00:00 2001 From: Wang Jianchao <jianchwa@xxxxxxxxxxx> Date: Fri, 7 Jul 2023 10:06:58 +0800 Subject: [PATCH 4/4] KVM: x86: open pv lazy tscdeadline forcily w/o modification in qemu This is not part of the patchset, but just a debug patch to make the test convenient. It can open the pv lazy tscdeadline w/o involving the qemu. echo 1 > /proc/sys/kernel/apic_lazy_tsc_deadline Since it is just for testing, the serializing is not so exact. Please use it w/o any running guests ;) Signed-off-by: Wang Jianchao <jianchwa@xxxxxxxxxxx> --- arch/x86/kernel/apic/apic.c | 1 + arch/x86/kvm/cpuid.c | 13 +++++++++++++ kernel/sysctl.c | 12 ++++++++++++ 3 files changed, 26 insertions(+) diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 0fe1215..e60aaa3 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -679,6 +679,7 @@ static void setup_APIC_timer(void) if (kvm_para_available() && kvm_para_has_feature(KVM_FEATURE_LAZY_TSCDEADLINE)) { levt->set_next_event = kvm_lapic_next_deadline; + pr_info("%s: switch set_next_event to lazy tscdeadline version\n", __func__); } else { levt->set_next_event = lapic_next_deadline; } diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 5a12601..ee5a828 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -380,9 +380,12 @@ u64 kvm_vcpu_reserved_gpa_bits_raw(struct kvm_vcpu *vcpu) return rsvd_bits(cpuid_maxphyaddr(vcpu), 63); } +extern int sysctl_apic_lazy_tsc_deadline; + static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2, int nent) { + struct kvm_cpuid_entry2 *pve2; int r; __kvm_update_cpuid_runtime(vcpu, e2, nent); @@ -423,6 +426,16 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2, vcpu->arch.kvm_cpuid = kvm_get_hypervisor_cpuid(vcpu, KVM_SIGNATURE); vcpu->arch.xen.cpuid = kvm_get_hypervisor_cpuid(vcpu, XEN_SIGNATURE); + + if (sysctl_apic_lazy_tsc_deadline) { + pve2 = kvm_find_kvm_cpuid_features(vcpu); + if (pve2) { + pr_info("set lazy tscdeadline forcily\n"); + pve2->eax |= 1 << KVM_FEATURE_LAZY_TSCDEADLINE; + } else { + pr_err("cannot open lazy tscdeadline forcily\n"); + } + } kvm_vcpu_after_set_cpuid(vcpu); return 0; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index bfe53e8..f5f94dd 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -137,6 +137,9 @@ int sysctl_legacy_va_layout; #endif /* CONFIG_SYSCTL */ +int sysctl_apic_lazy_tsc_deadline; +EXPORT_SYMBOL_GPL(sysctl_apic_lazy_tsc_deadline); + /* * /proc/sys support */ @@ -2055,6 +2058,15 @@ static struct ctl_table kern_table[] = { .extra2 = SYSCTL_INT_MAX, }, #endif + { + .procname = "apic_lazy_tsc_deadline", + .data = &sysctl_apic_lazy_tsc_deadline, + .maxlen = sizeof(sysctl_apic_lazy_tsc_deadline), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, { } }; -- 2.7.4