Re: [PATCH v2 22/24] KVM: x86: Rename inject_pending_events() to kvm_check_and_inject_events()

Maxim Levitsky <mlevitsk@xxxxxxxxxx> · Mon, 18 Jul 2022 16:05:27 +0300

On Fri, 2022-07-15 at 20:42 +0000, Sean Christopherson wrote:
> Rename inject_pending_events() to kvm_check_and_inject_events() in order
> to capture the fact that it handles more than just pending events, and to
> (mostly) align with kvm_check_nested_events(), which omits the "inject"
> for brevity.
> 
> Add a comment above kvm_check_and_inject_events() to provide a high-level
> synopsis, and to document a virtualization hole (KVM erratum if you will)
> that exists due to KVM not strictly tracking instruction boundaries with
> respect to coincident instruction restarts and asynchronous events.
> 
> No functional change inteded.
> 
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
>  arch/x86/kvm/svm/nested.c |  2 +-
>  arch/x86/kvm/svm/svm.c    |  2 +-
>  arch/x86/kvm/x86.c        | 46 ++++++++++++++++++++++++++++++++++++---
>  3 files changed, 45 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 0a8ee5f28319..028e180a74b6 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1310,7 +1310,7 @@ static void nested_svm_inject_exception_vmexit(struct kvm_vcpu *vcpu)
>                 else
>                         vmcb->control.exit_info_2 = vcpu->arch.cr2;
>         } else if (ex->vector == DB_VECTOR) {
> -               /* See inject_pending_event.  */
> +               /* See kvm_check_and_inject_events().  */
>                 kvm_deliver_exception_payload(vcpu, ex);
>  
>                 if (vcpu->arch.dr7 & DR7_GD) {
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index a336517b563e..95bdf127d531 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3518,7 +3518,7 @@ void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
>  
>         /* Note, this is called iff the local APIC is in-kernel. */
>         if (!READ_ONCE(vcpu->arch.apic->apicv_active)) {
> -               /* Process the interrupt via inject_pending_event */
> +               /* Process the interrupt via kvm_check_and_inject_events(). */
>                 kvm_make_request(KVM_REQ_EVENT, vcpu);
>                 kvm_vcpu_kick(vcpu);
>                 return;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b924afb76b72..69b9725beff3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9691,7 +9691,47 @@ static void kvm_inject_exception(struct kvm_vcpu *vcpu)
>         static_call(kvm_x86_inject_exception)(vcpu);
>  }
>  
> -static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit)
> +/*
> + * Check for any event (interrupt or exception) that is ready to be injected,
> + * and if there is at least one event, inject the event with the highest
> + * priority.  This handles both "pending" events, i.e. events that have never
> + * been injected into the guest, and "injected" events, i.e. events that were
> + * injected as part of a previous VM-Enter, but weren't successfully delivered
> + * and need to be re-injected.
> + *
> + * Note, this is not guaranteed to be invoked on a guest instruction boundary,
> + * i.e. doesn't guarantee that there's an event window in the guest.  KVM must
> + * be able to inject exceptions in the "middle" of an instruction, and so must
> + * also be able to re-inject NMIs and IRQs in the middle of an instruction.
> + * I.e. for exceptions and re-injected events, NOT invoking this on instruction
> + * boundaries is necessary and correct.
> + *
> + * For simplicity, KVM uses a single path to inject all events (except events
> + * that are injected directly from L1 to L2) and doesn't explicitly track
> + * instruction boundaries for asynchronous events.  However, because VM-Exits
> + * that can occur during instruction execution typically result in KVM skipping
> + * the instruction or injecting an exception, e.g. instruction and exception
> + * intercepts, and because pending exceptions have higher priority than pending
> + * interrupts, KVM still honors instruction boundaries in most scenarios.
> + *
> + * But, if a VM-Exit occurs during instruction execution, and KVM does NOT skip
> + * the instruction or inject an exception, then KVM can incorrecty inject a new
> + * asynchrounous event if the event became pending after the CPU fetched the
> + * instruction (in the guest).  E.g. if a page fault (#PF, #NPF, EPT violation)
> + * occurs and is resolved by KVM, a coincident NMI, SMI, IRQ, etc... can be
> + * injected on the restarted instruction instead of being deferred until the
> + * instruction completes.
> + *
> + * In practice, this virtualization hole is unlikely to be observed by the
> + * guest, and even less likely to cause functional problems.  To detect the
> + * hole, the guest would have to trigger an event on a side effect of an early
> + * phase of instruction execution, e.g. on the instruction fetch from memory.
> + * And for it to be a functional problem, the guest would need to depend on the
> + * ordering between that side effect, the instruction completing, _and_ the
> + * delivery of the asynchronous event.
> + */
> +static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu,
> +                                      bool *req_immediate_exit)
>  {
>         bool can_inject;
>         int r;
> @@ -10170,7 +10210,7 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
>          * When APICv gets disabled, we may still have injected interrupts
>          * pending. At the same time, KVM_REQ_EVENT may not be set as APICv was
>          * still active when the interrupt got accepted. Make sure
> -        * inject_pending_event() is called to check for that.
> +        * kvm_check_and_inject_events() is called to check for that.
>          */
>         if (!apic->apicv_active)
>                 kvm_make_request(KVM_REQ_EVENT, vcpu);
> @@ -10467,7 +10507,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>                         goto out;
>                 }
>  
> -               r = inject_pending_event(vcpu, &req_immediate_exit);
> +               r = kvm_check_and_inject_events(vcpu, &req_immediate_exit);
>                 if (r < 0) {
>                         r = 0;
>                         goto out;

Makes sense.

Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx>

Best regards,
	Maxim Levitsky