Re: [PATCH 1/3] KVM: VMX: Retry APIC-access page reload if invalidation is in-progress

Sean Christopherson <seanjc@xxxxxxxxxx> · Tue, 13 Jun 2023 12:07:57 -0700

On Thu, Jun 08, 2023, Yu Zhang wrote:
> > 
> > Flushing when KVM zaps SPTEs is definitely necessary.  But the flush in
> > vmx_set_apic_access_page_addr() *should* be redundant.
> > 
> > > Could we try to return false in kvm_unmap_gfn_range() to indicate no more
> > > flush is needed, if the range to be unmapped falls within guest APIC base,
> > > and leaving the TLB invalidation work to vmx_set_apic_access_page_addr()?
> > 
> > No, because vmx_flush_tlb_current(), a.k.a. KVM_REQ_TLB_FLUSH_CURRENT, flushes
> > only the current root, i.e. on the current EP4TA.  kvm_unmap_gfn_range() isn't
> > tied to a single vCPU and so needs to flush all roots.  We could in theory more
> > precisely track which roots needs to be flushed, but in practice it's highly
> > unlikely to matter as there is typically only one "main" root when TDP (EPT) is
> > in use.  In other words, KVM could avoid unnecessarily flushing entries for other
> > roots, but it would incur non-trivial complexity, and the probability of the
> > precise flushing having a measurable impact on guest performance is quite low, at
> > least outside of nested scenarios.
> 
> Well, I can understand the invalidation shall be performed for both current EP4TA,
> and the nested EP4TA(EPT02) when host retries to reclaim a normal page, because L1
> may assign this page to L2. But for APIC base address, will L1 map this address to
> L2?

L1 can do whatever it wants.  E.g. L1 could passthrough its APIC to L2, in which
case, yes, L1 will map its APIC base into L2.  KVM (as L0) however doesn't support
mapping the APIC-access page into L2.  KVM *could* support utilizing APICv to
accelerate L2 when L1 has done a full APIC passthrough, but AFAIK no one has
requested such support.  Functionally, an APIC passthrough setup for L1=>L2 will
work, but KVM will trap and emulate APIC accesses from L2 instead of utilizing
hardware acceleration.

More commonly, L1 will use APICv for L2 and thus have an APIC-access page for L2,
and KVM will map _that_ page into L2.

> Also, what if the virtualize APIC access is to be supported in L2,

As above, KVM never maps the APIC-access page that KVM (as L0) manages into L2.

> and the backing page is being reclaimed in L0? I saw
> nested_get_vmcs12_pages() will check vmcs12 and set the APIC access address
> in VMCS02, but not sure if this routine will be triggered by the mmu
> notifier...

Pages from vmcs12 that are referenced by physical address in the VMCS are pinned
(where "pinned" means KVM holds a reference to the page) by kvm_vcpu_map().  I.e.
the page will not be migrated, and if userspace unmaps the page, userspace might
break its VM, but that's true for any guest memory that userspace unexpectedly
unmaps, and there won't be any no use-after-free issues.