On Tue, Nov 29, 2022, Jon Kohler wrote: > Set vcpu->mode to EXITING_GUEST_MODE as soon vCPU exits to reflect > that we are indeed exiting guest mode, but not quite out of guest > mode yet. Note: This is done lazily without an explicit memory > barrier so that we do not regress the cost in the critical path > of going from the exit to the exit handler. This is not remotely sufficient justification. Memory "barriers" are just compiler barriers on x86, so the odds of regressing the VM-Enter/VM-Exit cost without introducing a bug are miniscule. > Flip back to IN_GUEST_MODE for exits that use > EXIT_FASTPATH_REENTER_GUEST, such that we are IN_GUEST_MODE upon > reentry. > > Changing vcpu->mode away from IN_GUEST_MODE as early as possible Except this isn't as early as possible. If we're going to bother doing something like this, my vote is to move it into assembly. > gives IPI senders as much runway as possible to avoid ringing > doorbell or sending posted interrupt IPI in AMD and Intel, > respectively. Since this is done without an explicit memory > barrier, the worst case is that the IPI sender sees IN_GUEST_MODE > still and sends a spurious event, which is the behavior prior > to this patch. No, worst case scenario is that kvm_vcpu_exiting_guest_mode() sees EXITING_GUEST_MODE and doesn't kick the vCPU. For "kicks" that set a request, kvm_vcpu_exit_request() will punt the vCPU out of the tight run loop, though there might be ordering issues. But whether or not there are ordering issues is a moot point since there are uses of kvm_vcpu_kick() that aren't accompanied by a request, e.g. to purge the PML buffers. In other words, kvm_vcpu_kick() absolutely cannot have false negatives. We could modify KVM to require a request when using kvm_vcpu_kick(), but that's a bit of a hack, and all of the possible ordering problems is still a pile of complexity I'd rather avoid. No small part of me thinks we'd be better off adding a dedicated flag to very precisely track whether or not the vCPU is truly "in the guest" for the purposes of sending IPIs. Things like kicks have different requirements around IN_GUEST_MODE than sending interrupts, e.g. KVM manually processes the IRR on every VM-Enter and so lack of an IPI is a non-issue, whereas missing an IPI for a kick is problematic. In other words, EXITING_GUEST_MODE really needs to mean "existing the run loop". E.g. toggle the dedicated flag within a few instructions of VM-Enter and VM-Exit for maximum efficiency for interrupts, and avoid having to make vcpu->mode more complex than it already is. To add clarity, we could even rename IN_GUEST_MODE and EXITING_GUEST_MODE to something like IN_RUN_LOOP and EXITING_RUN_LOOP. > Signed-off-by: Jon Kohler <jon@xxxxxxxxxxx> > --- > arch/x86/kvm/svm/svm.c | 7 +++++++ > arch/x86/kvm/vmx/vmx.c | 23 +++++++++++++++++++++++ > arch/x86/kvm/x86.c | 8 ++++++++ > 3 files changed, 38 insertions(+) > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index ce362e88a567..5f0c118a3ffd 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -3907,6 +3907,13 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in > else > __svm_vcpu_run(svm, spec_ctrl_intercepted); > > + /* Optimize IPI reduction by setting mode immediately after vmexit /* * Because KVM isn't the crazy land of net/ block comments should like * this. */ > + * without a memmory barrier as this as not paired anywhere. > + * is will be set to OUTSIDE_GUEST_MODE in x86 common code with a memory > + * barrier, after the host is done fully restoring various host states. > + */ > + vcpu->mode = EXITING_GUEST_MODE; > + > guest_state_exit_irqoff(); > } > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 63247c57c72c..243dcb87c727 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -5878,6 +5878,17 @@ static fastpath_t handle_fastpath_preemption_timer(struct kvm_vcpu *vcpu) > > if (!vmx->req_immediate_exit && > !unlikely(vmx->loaded_vmcs->hv_timer_soft_disabled)) { > + /* Reset IN_GUEST_MODE since we're going to reenter > + * guest as part of this fast path. This is done as > + * an optimization without a memory barrier since > + * EXITING_GUEST_MODE is also set without a memory Heh, justifying the lack of a memory barrier by saying pointing out the other code you added doesn't use a memory barrier is interesting, to put it politely. > + * barrier. This also needs to be reset prior to > + * calling apic_timer_expired() so that > + * kvm_use_posted_timer_interrupt() returns the proper > + * value. > + */ > + if (vcpu->mode == EXITING_GUEST_MODE) > + vcpu->mode = IN_GUEST_MODE; It's far easier, likely more performant, documents why this has a chance of working, and significantly less error prone to do this unconditionally in either assembly or after the EXIT_FASTPATH_REENTER_GUEST check in vcpu_enter_guest().