On Thu, Sep 22, 2022, Vitaly Kuznetsov wrote: > Now let's get to VMX and the point of my confusion (and thanks in > advance for educating me!): > AFAIU, when EPT is in use: > KVM_REQ_TLB_FLUSH_CURRENT == invept > KVM_REQ_TLB_FLUSH_GUEST = invvpid > > For "normal" mappings (which are mapped on both stages) this is the same > thing as they're 'tagged' with both VPID and 'EPT root'. The question is > what's left. Given your comment, do I understand correctly that in case > of an invalid mapping in the guest (GVA doesn't resolve to a GPA), this > will only be tagged with VPID but not with 'EPT root' (as the CPU never > reached to the second translation stage)? We certainly can't ignore > these. Another (probably pure theoretical question) is what are the > mappings which are tagged with 'EPT root' but don't have a VPID tag? Intel puts mappings into three categories, which for non-root mode equates to: linear == GVA => GPA guest-physical == GPA => HPA combined == GVA => HPA and essentially the categories that consume the GVA are tagged with the VPID (linear and combined), and categories that consume the GPA are tagged with the EPTP address (guest-physical and combined). > Are these the mapping which happen when e.g. vCPU has paging disabled? No, these mappings can be created at all times. Even with CR0.PG=1, the guest can generate GPAs without going through a GVA=>GPA translation, e.g. the page tables themselves, RTIT (Intel PT) addresses, etc... And even for combined/full translations, the CPU can insert TLB entries for just the GPA=>HPA part. E.g. when a page is allocated by/for userspace, the kernel will zero the page using the kernel's direct map, but userspace will access the page via a different GVA. I.e. the guest effectively aliases GPA(x) with GVA(k) and GVA(u). By inserting the GPA(x) => HPA(y) into the TLB, when guest userspace access GVA(u), the CPU encounters a TLB miss on GVA(u) => GPA(x), but gets a TLB hit on GPA(x) => HPA(y). Separating EPT flushes from VPID (and PCID) flushes allows the CPU to retain the partial TLB entries, e.g. a host change in the EPT tables will result in the guest-physical and combined mappings being invalidated, but linear mappings can be kept. I'm 99% certain AMD also caches partial entries, e.g. see the blurb on INVLPGA not affecting NPT translations, AMD just doesn't provide a way for the host to flush _only_ NPT translations. Maybe the performance benefits weren't significant enough to justify the extra complexity? > These are probably unrelated to Hyper-V TLB flushing. > > To preserve the 'small' optimization, we can probably move > kvm_clear_request(KVM_REQ_HV_TLB_FLUSH, vcpu); > > to nested_svm_transition_tlb_flush() or, in case this sounds too > hackish Move it to svm_flush_tlb_current(), because the justification is that on SVM, flushing "current" TLB entries also flushes "guest" TLB entries due to the more coarse-grained ASID-based TLB flush. E.g. diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index dd599afc85f5..a86b41503723 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3737,6 +3737,13 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); + /* + * Unlike VMX, SVM doesn't provide a way to flush only NPT TLB entries. + * A TLB flush for the current ASID flushes both "host" and "guest" TLB + * entries, and thus is a superset of Hyper-V's fine grained flushing. + */ + kvm_hv_vcpu_purge_flush_tlb(vcpu); + /* * Flush only the current ASID even if the TLB flush was invoked via * kvm_flush_remote_tlbs(). Although flushing remote TLBs requires all > we can drop it for now and add it to the (already overfull) > bucket of the "optimize nested_svm_transition_tlb_flush()". I think even long term, purging Hyper-V's FIFO in svm_flush_tlb_current() is the correct/desired behavior. This doesn't really have anything to do with nSVM, it's all about SVM not providing a way to flush only NPT entries.