If two page ready notifications happen back to back the second one is not delivered and the only mechanism we currently have is kvm_check_async_pf_completion() check in vcpu_run() loop. The check will only be performed with the next vmexit when it happens and in some cases it may take a while. With interrupt based page ready notification delivery the situation is even worse: unlike exceptions, interrupts are not handled immediately so we must check if the slot is empty. This is slow and unnecessary. Introduce dedicated MSR_KVM_ASYNC_PF_ACK MSR to communicate the fact that the slot is free and host should check its notification queue. Mandate using it for interrupt based type 2 APF event delivery. Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> --- Documentation/virt/kvm/msr.rst | 16 +++++++++++++++- arch/x86/include/uapi/asm/kvm_para.h | 1 + arch/x86/kvm/x86.c | 9 ++++++++- 3 files changed, 24 insertions(+), 2 deletions(-) diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst index 7433e55f7184..18db3448db06 100644 --- a/Documentation/virt/kvm/msr.rst +++ b/Documentation/virt/kvm/msr.rst @@ -219,6 +219,11 @@ data: If during pagefault APF reason is 0 it means that this is regular page fault. + For interrupt based delivery, guest has to write '1' to + MSR_KVM_ASYNC_PF_ACK every time it clears reason in the shared + 'struct kvm_vcpu_pv_apf_data', this forces KVM to re-scan its + queue and deliver next pending notification. + During delivery of type 1 APF cr2 contains a token that will be used to notify a guest when missing page becomes available. When page becomes available type 2 APF is sent with @@ -340,4 +345,13 @@ data: To switch to interrupt based delivery of type 2 APF events guests are supposed to enable asynchronous page faults and set bit 3 in - MSR_KVM_ASYNC_PF_EN first. + +MSR_KVM_ASYNC_PF_ACK: + 0x4b564d07 + +data: + Asynchronous page fault acknowledgment. When the guest is done + processing type 2 APF event and 'reason' field in 'struct + kvm_vcpu_pv_apf_data' is cleared it is supposed to write '1' to + Bit 0 of the MSR, this caused the host to re-scan its queue and + check if there are more notifications pending. diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 1bbb0b7e062f..5c7449980619 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -51,6 +51,7 @@ #define MSR_KVM_PV_EOI_EN 0x4b564d04 #define MSR_KVM_POLL_CONTROL 0x4b564d05 #define MSR_KVM_ASYNC_PF2 0x4b564d06 +#define MSR_KVM_ASYNC_PF_ACK 0x4b564d07 struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 861dce1e7cf5..e3b91ac33bfd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1243,7 +1243,7 @@ static const u32 emulated_msrs_all[] = { HV_X64_MSR_TSC_EMULATION_STATUS, MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME, - MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF2, + MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF2, MSR_KVM_ASYNC_PF_ACK, MSR_IA32_TSC_ADJUST, MSR_IA32_TSCDEADLINE, @@ -2915,6 +2915,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) if (kvm_pv_enable_async_pf2(vcpu, data)) return 1; break; + case MSR_KVM_ASYNC_PF_ACK: + if (data & 0x1) + kvm_check_async_pf_completion(vcpu); + break; case MSR_KVM_STEAL_TIME: if (unlikely(!sched_info_on())) @@ -3194,6 +3198,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_KVM_ASYNC_PF2: msr_info->data = vcpu->arch.apf.msr2_val; break; + case MSR_KVM_ASYNC_PF_ACK: + msr_info->data = 0; + break; case MSR_KVM_STEAL_TIME: msr_info->data = vcpu->arch.st.msr_val; break; -- 2.25.3