On Fri, 2022-07-15 at 20:42 +0000, Sean Christopherson wrote: > Rename inject_pending_events() to kvm_check_and_inject_events() in order > to capture the fact that it handles more than just pending events, and to > (mostly) align with kvm_check_nested_events(), which omits the "inject" > for brevity. > > Add a comment above kvm_check_and_inject_events() to provide a high-level > synopsis, and to document a virtualization hole (KVM erratum if you will) > that exists due to KVM not strictly tracking instruction boundaries with > respect to coincident instruction restarts and asynchronous events. > > No functional change inteded. > > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > arch/x86/kvm/svm/nested.c | 2 +- > arch/x86/kvm/svm/svm.c | 2 +- > arch/x86/kvm/x86.c | 46 ++++++++++++++++++++++++++++++++++++--- > 3 files changed, 45 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c > index 0a8ee5f28319..028e180a74b6 100644 > --- a/arch/x86/kvm/svm/nested.c > +++ b/arch/x86/kvm/svm/nested.c > @@ -1310,7 +1310,7 @@ static void nested_svm_inject_exception_vmexit(struct kvm_vcpu *vcpu) > else > vmcb->control.exit_info_2 = vcpu->arch.cr2; > } else if (ex->vector == DB_VECTOR) { > - /* See inject_pending_event. */ > + /* See kvm_check_and_inject_events(). */ > kvm_deliver_exception_payload(vcpu, ex); > > if (vcpu->arch.dr7 & DR7_GD) { > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index a336517b563e..95bdf127d531 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -3518,7 +3518,7 @@ void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode, > > /* Note, this is called iff the local APIC is in-kernel. */ > if (!READ_ONCE(vcpu->arch.apic->apicv_active)) { > - /* Process the interrupt via inject_pending_event */ > + /* Process the interrupt via kvm_check_and_inject_events(). */ > kvm_make_request(KVM_REQ_EVENT, vcpu); > kvm_vcpu_kick(vcpu); > return; > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index b924afb76b72..69b9725beff3 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -9691,7 +9691,47 @@ static void kvm_inject_exception(struct kvm_vcpu *vcpu) > static_call(kvm_x86_inject_exception)(vcpu); > } > > -static int inject_pending_event(struct kvm_vcpu *vcpu, bool *req_immediate_exit) > +/* > + * Check for any event (interrupt or exception) that is ready to be injected, > + * and if there is at least one event, inject the event with the highest > + * priority. This handles both "pending" events, i.e. events that have never > + * been injected into the guest, and "injected" events, i.e. events that were > + * injected as part of a previous VM-Enter, but weren't successfully delivered > + * and need to be re-injected. > + * > + * Note, this is not guaranteed to be invoked on a guest instruction boundary, > + * i.e. doesn't guarantee that there's an event window in the guest. KVM must > + * be able to inject exceptions in the "middle" of an instruction, and so must > + * also be able to re-inject NMIs and IRQs in the middle of an instruction. > + * I.e. for exceptions and re-injected events, NOT invoking this on instruction > + * boundaries is necessary and correct. > + * > + * For simplicity, KVM uses a single path to inject all events (except events > + * that are injected directly from L1 to L2) and doesn't explicitly track > + * instruction boundaries for asynchronous events. However, because VM-Exits > + * that can occur during instruction execution typically result in KVM skipping > + * the instruction or injecting an exception, e.g. instruction and exception > + * intercepts, and because pending exceptions have higher priority than pending > + * interrupts, KVM still honors instruction boundaries in most scenarios. > + * > + * But, if a VM-Exit occurs during instruction execution, and KVM does NOT skip > + * the instruction or inject an exception, then KVM can incorrecty inject a new > + * asynchrounous event if the event became pending after the CPU fetched the > + * instruction (in the guest). E.g. if a page fault (#PF, #NPF, EPT violation) > + * occurs and is resolved by KVM, a coincident NMI, SMI, IRQ, etc... can be > + * injected on the restarted instruction instead of being deferred until the > + * instruction completes. > + * > + * In practice, this virtualization hole is unlikely to be observed by the > + * guest, and even less likely to cause functional problems. To detect the > + * hole, the guest would have to trigger an event on a side effect of an early > + * phase of instruction execution, e.g. on the instruction fetch from memory. > + * And for it to be a functional problem, the guest would need to depend on the > + * ordering between that side effect, the instruction completing, _and_ the > + * delivery of the asynchronous event. > + */ > +static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu, > + bool *req_immediate_exit) > { > bool can_inject; > int r; > @@ -10170,7 +10210,7 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu) > * When APICv gets disabled, we may still have injected interrupts > * pending. At the same time, KVM_REQ_EVENT may not be set as APICv was > * still active when the interrupt got accepted. Make sure > - * inject_pending_event() is called to check for that. > + * kvm_check_and_inject_events() is called to check for that. > */ > if (!apic->apicv_active) > kvm_make_request(KVM_REQ_EVENT, vcpu); > @@ -10467,7 +10507,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > goto out; > } > > - r = inject_pending_event(vcpu, &req_immediate_exit); > + r = kvm_check_and_inject_events(vcpu, &req_immediate_exit); > if (r < 0) { > r = 0; > goto out; Makes sense. Reviewed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> Best regards, Maxim Levitsky