@Alex, do you know the answer to Sean's question below about ASID handling in Hyper-V? Any other comments about how we're intending to fix things are also welcome. Jeremi On 08/03/2023 20:11, Sean Christopherson wrote: > On Wed, Mar 08, 2023, Jeremi Piotrowski wrote: >> On 08/03/2023 01:39, Sean Christopherson wrote: >>> On Wed, Mar 08, 2023, Paolo Bonzini wrote: >>>> On Tue, Mar 7, 2023 at 6:36 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: >>>>> Thinking about this more, I would rather revert commit 1e0c7d40758b ("KVM: SVM: >>>>> hyper-v: Remote TLB flush for SVM") or fix the thing properly straitaway. KVM >>>>> doesn't magically handle the flushes correctly for the shadow/legacy MMU, KVM just >>>>> happens to get lucky and not run afoul of the underlying bugs. >>>> >>>> I don't think it's about luck---the legacy MMU's zapping/invalidation >>>> seems to invoke the flush hypercall correctly: >>> >>> ...for the paths that Jeremi has exercised, and for which a stale TLB entry is >>> fatal to L2. E.g. kvm_unmap_gfn_range() does not have a range-based TLB flush >>> in its path and fully relies on the buggy kvm_flush_remote_tlbs(). >>> >> >> Why do you say "buggy kvm_flush_remote_tlbs"? kvm_flush_remote_tlbs calls the >> hypercall that is needed, I don't see how this might be an issue of a missing >> "range-based TLB flush". > > Doh, I forgot that the arch hook in kvm_flush_remote_tlbs() leads to the Hyper-V > hook. > > svm_flush_tlb_current is very much broken, but in practice it doesn't matter outside > of the direct call from kvm_mmu_load(), because in all other paths KVM will flow > through a Hyper-V flush if KVM actually modifies its MMU in any ways. E.g. the > request from kvm_mmu_new_pgd() when force_flush_and_sync_on_reuse=true is neutered, > but that exists only as a safeguard against MMU bugs. And for things like > kvm_invalidate_pcid() and kvm_post_set_cr4(), my understanding is that Hyper-V > will still flush the bare metal TLB, it's only Hyper-V's shadow page tables that > are stale. > > Depending on how Hyper-V handles ASIDs, pre_svm_run() may also be broken. If > Hyper-V tracks and rebuilds only the current ASID, then bumping the ASID won't > rebuild potentially stale page tables. But I'm guessing Hyper-V ignores the ASID > since the hypercall takes only the root PA. > > The truly problematic case is kvm_mmu_load(), where KVM relies on the flush to > force Hyper-V to rebuild shadow page tables for an old, possibly stale nCR3. This > affects only the TDP MMU because of an explicit optimization in the TDP MMU. So > in practice we could squeak by with something like this: > > if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb) > hyperv_flush_guest_mapping(vcpu->arch.mmu->root.hpa); > else > static_call(kvm_x86_flush_tlb_current)(vcpu); > > but I'm not convinced that avoiding a hypercall in svm_flush_tlb_current() just > to avoid overhead when running an L3 (nested VM from L1 KVM's perspective) is > worthwhile. The real problem there is that KVM nested SVM TLB/ASID support is an > unoptimized mess, and I can't imagine someone running an L3 is going to be super > concerned with performance. > > I also think we should have a sanity check in the flush_tlb_all() path, i.e. WARN > if kvm_flush_remote_tlbs() falls back. > > Something like this (probably doesn't compile, likely needs #ifdefs or helpers): > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 7ef4f9e3b35a..38afc5cac1c4 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -3770,7 +3770,7 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu) > svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF); > } > > -static void svm_flush_tlb_current(struct kvm_vcpu *vcpu) > +static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu) > { > struct vcpu_svm *svm = to_svm(vcpu); > > @@ -3794,6 +3794,23 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu) > svm->current_vmcb->asid_generation--; > } > > +static void svm_flush_tlb_current(struct kvm_vcpu *vcpu) > +{ > + if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb && > + VALID_PAGE(vcpu->arch.mmu->root.hpa)) > + hyperv_flush_guest_mapping(vcpu->arch.mmu->root.hpa); > + > + svm_flush_tlb_asid(vcpu); > +} > + > +static void svm_flush_tlb_all(struct kvm_vcpu *vcpu) > +{ > + if (WARN_ON_ONCE(kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb)) > + hv_remote_flush_tlb(vcpu->kvm); > + > + svm_flush_tlb_asid(vcpu); > +} > + > static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva) > { > struct vcpu_svm *svm = to_svm(vcpu); > @@ -4786,10 +4803,10 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { > .set_rflags = svm_set_rflags, > .get_if_flag = svm_get_if_flag, > > - .flush_tlb_all = svm_flush_tlb_current, > + .flush_tlb_all = svm_flush_tlb_all, > .flush_tlb_current = svm_flush_tlb_current, > .flush_tlb_gva = svm_flush_tlb_gva, > - .flush_tlb_guest = svm_flush_tlb_current, > + .flush_tlb_guest = svm_flush_tlb_asid, > > .vcpu_pre_run = svm_vcpu_pre_run, > .vcpu_run = svm_vcpu_run,