Sean Christopherson <seanjc@xxxxxxxxxx> writes: > On Thu, Sep 22, 2022, Vitaly Kuznetsov wrote: >> Now let's get to VMX and the point of my confusion (and thanks in >> advance for educating me!): >> AFAIU, when EPT is in use: >> KVM_REQ_TLB_FLUSH_CURRENT == invept >> KVM_REQ_TLB_FLUSH_GUEST = invvpid >> >> For "normal" mappings (which are mapped on both stages) this is the same >> thing as they're 'tagged' with both VPID and 'EPT root'. The question is >> what's left. Given your comment, do I understand correctly that in case >> of an invalid mapping in the guest (GVA doesn't resolve to a GPA), this >> will only be tagged with VPID but not with 'EPT root' (as the CPU never >> reached to the second translation stage)? We certainly can't ignore >> these. Another (probably pure theoretical question) is what are the >> mappings which are tagged with 'EPT root' but don't have a VPID tag? > > Intel puts mappings into three categories, which for non-root mode equates to: > > linear == GVA => GPA > guest-physical == GPA => HPA > combined == GVA => HPA > > and essentially the categories that consume the GVA are tagged with the VPID > (linear and combined), and categories that consume the GPA are tagged with the > EPTP address (guest-physical and combined). > >> Are these the mapping which happen when e.g. vCPU has paging disabled? > > No, these mappings can be created at all times. Even with CR0.PG=1, the guest > can generate GPAs without going through a GVA=>GPA translation, e.g. the page tables > themselves, RTIT (Intel PT) addresses, etc... And even for combined/full > translations, the CPU can insert TLB entries for just the GPA=>HPA part. > > E.g. when a page is allocated by/for userspace, the kernel will zero the page using > the kernel's direct map, but userspace will access the page via a different GVA. > I.e. the guest effectively aliases GPA(x) with GVA(k) and GVA(u). By inserting > the GPA(x) => HPA(y) into the TLB, when guest userspace access GVA(u), the CPU > encounters a TLB miss on GVA(u) => GPA(x), but gets a TLB hit on GPA(x) => HPA(y). > > Separating EPT flushes from VPID (and PCID) flushes allows the CPU to retain > the partial TLB entries, e.g. a host change in the EPT tables will result in the > guest-physical and combined mappings being invalidated, but linear mappings can > be kept. > Thanks a bunch! For some reason I though it's always the full thing (combined) which is tagged with both VPID/PCID and EPTP and linear/guest-physical are just 'corner' cases (but are still combined and tagged). Apparently, it's not like that. > I'm 99% certain AMD also caches partial entries, e.g. see the blurb on INVLPGA > not affecting NPT translations, AMD just doesn't provide a way for the host to > flush _only_ NPT translations. Maybe the performance benefits weren't significant > enough to justify the extra complexity? > >> These are probably unrelated to Hyper-V TLB flushing. >> >> To preserve the 'small' optimization, we can probably move >> kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu); >> >> to nested_svm_transition_tlb_flush() or, in case this sounds too >> hackish > > Move it to svm_flush_tlb_current(), because the justification is that on SVM, > flushing "current" TLB entries also flushes "guest" TLB entries due to the more > coarse-grained ASID-based TLB flush. E.g. > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index dd599afc85f5..a86b41503723 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -3737,6 +3737,13 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu) > { > struct vcpu_svm *svm = to_svm(vcpu); > > + /* > + * Unlike VMX, SVM doesn't provide a way to flush only NPT TLB entries. > + * A TLB flush for the current ASID flushes both "host" and "guest" TLB > + * entries, and thus is a superset of Hyper-V's fine grained flushing. > + */ > + kvm_hv_vcpu_purge_flush_tlb(vcpu); > + > /* > * Flush only the current ASID even if the TLB flush was invoked via > * kvm_flush_remote_tlbs(). Although flushing remote TLBs requires all > >> we can drop it for now and add it to the (already overfull) >> bucket of the "optimize nested_svm_transition_tlb_flush()". > > I think even long term, purging Hyper-V's FIFO in svm_flush_tlb_current() is the > correct/desired behavior. This doesn't really have anything to do with nSVM, > it's all about SVM not providing a way to flush only NPT entries. True that, silly me forgot that even without any nesting, Hyper-V TLB flush after svm_flush_tlb_current() makes no sense. > -- Vitaly