On Wed, Mar 08, 2023, Jeremi Piotrowski wrote: > On 08/03/2023 16:55, Jeremi Piotrowski wrote: > > > > > > On 08/03/2023 01:39, Sean Christopherson wrote: > >> On Wed, Mar 08, 2023, Paolo Bonzini wrote: > >>> On Tue, Mar 7, 2023 at 6:36 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > >>>> Thinking about this more, I would rather revert commit 1e0c7d40758b ("KVM: SVM: > >>>> hyper-v: Remote TLB flush for SVM") or fix the thing properly straitaway. KVM > >>>> doesn't magically handle the flushes correctly for the shadow/legacy MMU, KVM just > >>>> happens to get lucky and not run afoul of the underlying bugs. > >>> > >>> I don't think it's about luck---the legacy MMU's zapping/invalidation > >>> seems to invoke the flush hypercall correctly: > >> > >> ...for the paths that Jeremi has exercised, and for which a stale TLB entry is > >> fatal to L2. E.g. kvm_unmap_gfn_range() does not have a range-based TLB flush > >> in its path and fully relies on the buggy kvm_flush_remote_tlbs(). > >> > > > > Why do you say "buggy kvm_flush_remote_tlbs"? kvm_flush_remote_tlbs calls the hypercall > > that is needed, I don't see how this might be an issue of a missing "range-based TLB flush". > > > > kvm_unmap_gfn_range is called from kvm_mmu_notifier_invalidate_range_start and 'flush_on_ret=true' > > is set, so it is followed by kvm_flush_remote_tlbs which calls hv_remote_flush_tlb. > > > >> In other words, KVM is getting lucky :-) > >> > >>> Jeremi, did you ever track the call stack where > >>> hyperv_nested_flush_guest_mapping is triggered? > >> > >> I don't think it matters. As above, it only takes one path where KVM is fully > >> relying on kvm_flush_remote_tlbs() for the whole thing to fall apart > > Slowly I'm starting to understand what we've been talking about, thank you :) > > Paolo/Sean, what do you think about smth like the following, except I would make > it SVM only, and I'd need to think about what to do with the return. > I believe this accurately reflects what the optimization is about. hv_track_root_tdp > is called from kvm_mmu_load_pgd, which covers both kvm_mmu_load and kvm_mmu_new_pgd > (which requests KVM_REQ_LOAD_MMU_PGD). It's close, but KVM doesn't *always* need to flush when loading a root. KVM needs to flush when loading a brand spanking new root, which is the kvm_mmu_load() path. But when KVM loads a root via KVM_REQ_LOAD_MMU_PGD/kvm_mmu_new_pgd(), a flush may or may not be necessary, e.g. if KVM reuses an old, but still valid, root (each vCPU has a 3-entry root cache) and a TLB flush isn't architecturally required, then there is no need to flush. And as mentioned in the other tendril of this thread, I'd really like to fix svm_flush_tlb_current() since it's technically broken, even though it's highly unlikely (maybe even impossible?) to cause issues in practice. > diff --git a/arch/x86/kvm/kvm_onhyperv.c b/arch/x86/kvm/kvm_onhyperv.c > index 482d6639ef88..6a5bd3cbace8 100644 > --- a/arch/x86/kvm/kvm_onhyperv.c > +++ b/arch/x86/kvm/kvm_onhyperv.c > @@ -29,6 +29,18 @@ static inline int hv_remote_flush_root_tdp(hpa_t root_tdp, > return hyperv_flush_guest_mapping(root_tdp); > } > > +static int hv_vcpu_flush_tlb_current(struct kvm_vcpu *vcpu) > +{ > + struct kvm_arch *kvm_arch = &vcpu->kvm->arch; > + hpa_t root_tdp = vcpu->arch.hv_root_tdp; > + int ret; > + > + ret = hyperv_flush_guest_mapping(root_tdp); > + if (!ret) > + kvm_arch->hv_root_tdp = root_tdp; > + return ret; > +} > + > int hv_remote_flush_tlb_with_range(struct kvm *kvm, > struct kvm_tlb_range *range) > { > @@ -101,8 +113,10 @@ void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp) > if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb) { > spin_lock(&kvm_arch->hv_root_tdp_lock); > vcpu->arch.hv_root_tdp = root_tdp; > - if (root_tdp != kvm_arch->hv_root_tdp) > + if (root_tdp != kvm_arch->hv_root_tdp) { > kvm_arch->hv_root_tdp = INVALID_PAGE; > + hv_vcpu_flush_tlb_current(vcpu); > + } > spin_unlock(&kvm_arch->hv_root_tdp_lock); > } > } >