On Thu, Dec 08, 2022, Maxim Levitsky wrote: > On Wed, 2022-12-07 at 18:02 +0200, Maxim Levitsky wrote: > On Sat, 2022-10-01 at 00:58 +0000, Sean Christopherson wrote: > > --- a/arch/x86/kvm/svm/avic.c > > +++ b/arch/x86/kvm/svm/avic.c > > @@ -86,6 +86,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm) > > /* Disabling MSR intercept for x2APIC registers */ > > svm_set_x2apic_msr_interception(svm, false); > > } else { > > + /* > > + * Flush the TLB, the guest may have inserted a non-APIC > > + * mapping into the TLB while AVIC was disabled. > > + */ > > + kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, &svm->vcpu); > > + > > /* For xAVIC and hybrid-xAVIC modes */ > > vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID; > > /* Enabling MSR intercept for x2APIC registers */ > > > I agree, that if guest disables APIC on a vCPU, this will lead to call to > kvm_apic_update_apicv which will disable AVIC, but if other vCPUs don't > disable it, the AVIC's private memslot will still be mapped and guest could > read/write it from this vCPU, and its TLB mapping needs to be invalidated > if/when APIC is re-enabled. > > However I think that this adds an unnecessarily (at least in the future) > performance penalty to AVIC nesting coexistence: > > L1's AVIC is inhibited on each nested VM entry, and uninhibited on each > nested VM exit, but while nested the guest can't really access it as it has > its own NPT. > > With this patch KVM will invalidate L1's TLB on each nested VM exit. KVM > sadly already does this but this can be fixed (its another thing on my TODO > list) > > Note that APICv doesn't have this issue, it is not inhibited on nested VM > entry/exit, thus this code is not performance sensitive for APICv. > > > I somewhat vote again, as I said before to disable the APICv/AVIC memslot, if > any of vCPUs have APICv/AVIC hardware disabled, because it is also more > correct from an x86 perspective. I do wonder how often is the usage of having > "extra" cpus but not using them, and thus having their APIC in disabled > state. There are legimate scenarios where a kernel might want to disable the APIC on select CPUs, e.g. to offline SMT siblings in BIOS. Whether or not doing that in a VM makes sense is debatable, but we really have no way of knowing if there are existing guests that selectively disable APICs. > KVM does support adding new vCPUs on the fly, so this shouldn't be needed, > and APICv inhibit in this case is just a perf regression. Heh, "just" a perf regression. Inhibiting APICv would definitely be a perf regression that people care about, e.g. see the very recent bug fixes: https://lore.kernel.org/all/20221116205123.18737-1-gedwards@xxxxxxx https://lore.kernel.org/all/1669984574-32692-1-git-send-email-yuanzhaoxiong@xxxxxxxxx Conceptually, I like the idea of inhibiting the APICv memslot if a vCPU has its APIC hardware disabled. But practically speaking, because KVM has allowed that scenario for years, I don't think we should introduce such an inhibit and risk regressing guests. > Or at least do this only when APIC does back from hardware disabled state to > enabled. I have no objection to fine tuning this in follow-up, but for this bug fix I'd much prefer to go with this minimal change. The nested SVM TLB flushing issue extends far beyond this one case, i.e. needs a non-trivial overhaul and an audit of pretty every piece of SVM code that can interact with TLBs.