Re: [PATCH] KVM: nVMX: Always use TLB_FLUSH_GUEST for nested VM-Enter/VM-Exit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 16, 2025 at 2:35 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Thu, Jan 16, 2025, Yosry Ahmed wrote:
> > On Thu, Jan 16, 2025 at 9:11 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Jan 16, 2025, Yosry Ahmed wrote:
> > > > On Wed, Jan 15, 2025 at 9:27 PM Jim Mattson <jmattson@xxxxxxxxxx> wrote:
> > > > > On Wed, Jan 15, 2025 at 7:50 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
> > > > > > Use KVM_REQ_TLB_FLUSH_GUEST in this case in
> > > > > > nested_vmx_transition_tlb_flush() for consistency. This arguably makes
> > > > > > more sense conceptually too -- L1 and L2 cannot share the TLB tag for
> > > > > > guest-physical translations, so only flushing linear and combined
> > > > > > translations (i.e. guest-generated translations) is needed.
> > >
> > > No, using KVM_REQ_TLB_FLUSH_CURRENT is correct.  From *L1's* perspective, VPID
> > > is enabled, and so VM-Entry/VM-Exit are NOT architecturally guaranteed to flush
> > > TLBs, and thus KVM is not required to FLUSH_GUEST.
> > >
> > > E.g. if KVM is using shadow paging (no EPT whatsoever), and L1 has modified the
> > > PTEs used to map L2 but has not yet flushed TLBs for L2's VPID, then KVM is allowed
> > > to retain its old, "stale" SPTEs that map L2 because architecturally they aren't
> > > guaranteed to be visible to L2.
> > >
> > > But because L1 and L2 share TLB entries *in hardware*, KVM needs to ensure the
> > > hardware TLBs are flushed.  Without EPT, KVM will use different CR3s for L1 and
> > > L2, but Intel's ASID tag doesn't include the CR3 address, only the PCID, which
> > > KVM always pulls from guest CR3, i.e. could be the same for L1 and L2.
> > >
> > > Specifically, the synchronization of shadow roots in kvm_vcpu_flush_tlb_guest()
> > > is not required in this scenario.
> >
> > Aha, I was examining vmx_flush_tlb_guest() not
> > kvm_vcpu_flush_tlb_guest(), so I missed the synchronization. Yeah I
> > think it's possible that we end up unnecessarily synchronizing the
> > shadow page tables (or dropping them) in this case.
> >
> > Do you think it's worth expanding the comment in
> > nested_vmx_transition_tlb_flush()?
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index 2ed454186e59c..43d34e413d016 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -1239,6 +1239,11 @@ static void
> > nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
> >          * does not have a unique TLB tag (ASID), i.e. EPT is disabled and
> >          * KVM was unable to allocate a VPID for L2, flush the current context
> >          * as the effective ASID is common to both L1 and L2.
> > +        *
> > +        * Note that even though TLB_FLUSH_GUEST would be correct because we
> > +        * only need to flush linear mappings, it would unnecessarily
> > +        * synchronize the MMU even though a TLB flush is not architecturally
> > +        * required from L1's perspective.
>
> I'm open to calling out that there's no flush from L1's perspective, but this
> is inaccurate.  Using TLB_FLUSH_GUEST is simply not correct.  Will it cause
> functional problems?  No.  But neither would blasting kvm_flush_remote_tlbs(),
> and I think most people would consider flushing all TLBs on all vCPUs to be a
> bug.

Yeah I meant functionally correct as it does not cause correctness
issues, but definitely a problem.

>
> How about:
>
>          * Note, only the hardware TLB entries need to be flushed, as VPID is
>          * fully enabled from L1's perspective, i.e. there's no architectural
>          * TLB flush from L1's perspective.

I hate to bikeshed, but I want to explicitly call out that we do not
need to synchronize the MMU. Maybe this?

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 2ed454186e59c..a9171909de63d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1239,6 +1239,11 @@ static void
nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
         * does not have a unique TLB tag (ASID), i.e. EPT is disabled and
         * KVM was unable to allocate a VPID for L2, flush the current context
         * as the effective ASID is common to both L1 and L2.
+        *
+        * Note, only the hardware TLB entries need to be flushed, as VPID is
+        * fully enabled from L1's perspective, i.e. there's no
+        * architectural TLB flush from L1's perspective. Hence, synchronizing
+        * the MMU is not required as the mappings are still valid.
         */
        if (!nested_has_guest_tlb_tag(vcpu))
                kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux