On Thu, Apr 13, 2023, David Matlack wrote: > On Thu, Apr 13, 2023 at 12:10 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > On Thu, Apr 13, 2023, Sean Christopherson wrote: > > > Aha! Idea. There are _at most_ 4 possible roots the TDP MMU can encounter. > > > 4-level non-SMM, 4-level SMM, 5-level non-SMM, and 5-level SMM. I.e. not keeping > > > inactive roots on a per-VM basis is just monumentally stupid. > > > > One correction: there are 6 possible roots: > > > > 1. 4-level !SMM !guest_mode (i.e. not nested) > > 2. 4-level SMM !guest_mode > > 3. 5-level !SMM !guest_mode > > 4. 5-level SMM !guest_mode > > 5. 4-level !SMM guest_mode > > 6. 5-level !SMM guest_mode > > > > I forgot that KVM still uses the TDP MMU when running L2 if L1 doesn't enable > > EPT/TDP, i.e. if L1 is using shadow paging for L2. But that really doesn't change > > anything as each vCPU can already track 4 roots, i.e. userspace can saturate all > > 6 roots anyways. And in practice, no sane VMM will create a VM with both 4-level > > and 5-level roots (KVM keys off of guest.MAXPHYADDR for the TDP root level). > > Why do we create a new root for guest_mode=1 if L1 disables EPT/NPT? Because "private", a.k.a. KVM-internal, memslots are visible to L1 but not L2. Which for TDP means the APIC-access page. From commit 3a2936dedd20: kvm: mmu: Don't expose private memslots to L2 These private pages have special purposes in the virtualization of L1, but not in the virtualization of L2. In particular, L1's APIC access page should never be entered into L2's page tables, because this causes a great deal of confusion when the APIC virtualization hardware is being used to accelerate L2's accesses to its own APIC. FWIW, I _think_ KVM could actually let L2 access the APIC-access page when L1 is running without any APIC virtualization, i.e. when L1 is passing its APIC through to L2. E.g. something like the below, but I ain't touching that with a 10 foot pole unless someone explicitly asks for it :-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 039fb16560a0..8aa12f5f2c30 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4370,10 +4370,13 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (!kvm_is_visible_memslot(slot)) { /* Don't expose private memslots to L2. */ if (is_guest_mode(vcpu)) { - fault->slot = NULL; - fault->pfn = KVM_PFN_NOSLOT; - fault->map_writable = false; - return RET_PF_CONTINUE; + if (!slot || slot->id != APIC_ACCESS_PAGE_PRIVATE_MEMSLOT || + nested_cpu_has_virtual_apic(vcpu)) { + fault->slot = NULL; + fault->pfn = KVM_PFN_NOSLOT; + fault->map_writable = false; + return RET_PF_CONTINUE; + } } /* * If the APIC access page exists but is disabled, go directly