Re: [PATCH v3 23/37] KVM: nVMX: Add helper to handle TLB flushes on nested VM-Enter/VM-Exit

Lai Jiangshan <jiangshanlai+lkml@xxxxxxxxx> · Sat, 30 Oct 2021 09:34:31 +0800

/

On Sat, Oct 30, 2021 at 1:10 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> TL;DR: I'll work on a proper series next week, there are multiple things that need
> to be fixed.
>
> On Fri, Oct 29, 2021, Lai Jiangshan wrote:
> > On Thu, Oct 28, 2021 at 11:22 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > > The fix should simply be:
> > >
> > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > > index eedcebf58004..574823370e7a 100644
> > > --- a/arch/x86/kvm/vmx/nested.c
> > > +++ b/arch/x86/kvm/vmx/nested.c
> > > @@ -1202,17 +1202,15 @@ static void nested_vmx_transition_tlb_flush(struct kvm_vcpu *vcpu,
> > >          *
> > >          * If a TLB flush isn't required due to any of the above, and vpid12 is
> > >          * changing then the new "virtual" VPID (vpid12) will reuse the same
> > > -        * "real" VPID (vpid02), and so needs to be flushed.  There's no direct
> > > -        * mapping between vpid02 and vpid12, vpid02 is per-vCPU and reused for
> > > -        * all nested vCPUs.  Remember, a flush on VM-Enter does not invalidate
> > > -        * guest-physical mappings, so there is no need to sync the nEPT MMU.
> > > +        * "real" VPID (vpid02), and so needs to be flushed.  Like the !vpid02
> > > +        * case above, this is a full TLB flush from the guest's perspective.
> > >          */
> > >         if (!nested_has_guest_tlb_tag(vcpu)) {
> > >                 kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> > >         } else if (is_vmenter &&
> > >                    vmcs12->virtual_processor_id != vmx->nested.last_vpid) {
> > >                 vmx->nested.last_vpid = vmcs12->virtual_processor_id;
> > > -               vpid_sync_context(nested_get_vpid02(vcpu));
> > > +               kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> >
> > This change is neat.
>
> Heh, yeah, but too neat to be right :-)
>
> > But current KVM_REQ_TLB_FLUSH_GUEST flushes vpid01 only, and it doesn't flush
> > vpid02.  vmx_flush_tlb_guest() might need to be changed to flush vpid02 too.
>
> Hmm.  I think vmx_flush_tlb_guest() is straight up broken.  E.g. if EPT is enabled
> but L1 doesn't use EPT for L2 and doesn't intercept INVPCID, then KVM will handle
> INVPCID from L2.  That means the recent addition to kvm_invalidate_pcid() (see
> below) will flush the wrong VPID.  And it's incorrect (well, more than is required
> by the SDM) to flush both VPIDs because flushes from INVPCID (and flushes from the
> guest's perspective in general) are scoped to the current VPID, e.g. a "full" TLB
> flush in the "host" by toggling CR4.PGE flushes only the current VPID:

I think KVM_REQ_TLB_FLUSH_GUEST/kvm_vcpu_flush_tlb_guest/vmx_flush_tlb_guest
was deliberately designed for the L1 guest only.  It can be seen from the code,
from the history, and from the caller's side.  For example,
nested_vmx_transition_tlb_flush() knows KVM_REQ_TLB_FLUSH_GUEST flushes
L1 guest:

        /*
         * If vmcs12 doesn't use VPID, L1 expects linear and combined mappings
         * for *all* contexts to be flushed on VM-Enter/VM-Exit, i.e. it's a
         * full TLB flush from the guest's perspective.  This is required even
         * if VPID is disabled in the host as KVM may need to synchronize the
         * MMU in response to the guest TLB flush.
         *
         * Note, using TLB_FLUSH_GUEST is correct even if nested EPT is in use.
         * EPT is a special snowflake, as guest-physical mappings aren't
         * flushed on VPID invalidations, including VM-Enter or VM-Exit with
         * VPID disabled.  As a result, KVM _never_ needs to sync nEPT
         * entries on VM-Enter because L1 can't rely on VM-Enter to flush
         * those mappings.
         */
        if (!nested_cpu_has_vpid(vmcs12)) {
                kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
                return;
        }

While handle_invvpid() doesn't use KVM_REQ_TLB_FLUSH_GUEST.

So, I don't think KVM_REQ_TLB_FLUSH_GUEST, kvm_vcpu_flush_tlb_guest
or vmx_flush_tlb_guest is broken since they are for L1 guests.
What we have to do is to consider is it worth extending them for
nested guests for the convenience of nested code.

I second that they are extended.

A small comment in your proposal: I found that KVM_REQ_TLB_FLUSH_CURRENT
and KVM_REQ_TLB_FLUSH_GUEST is to flush "current" vpid only, some special
work needs to be added when switching mmu from L1 to L2 and vice versa:
handle the requests before switching.