Re: [PATCH 1/2] x86: KVM: Limit guest physical bits when 5-level EPT is unsupported

Jim Mattson <jmattson@xxxxxxxxxx> · Wed, 3 Jan 2024 19:40:02 -0800

On Wed, Jan 3, 2024 at 6:45 PM Chao Gao <chao.gao@xxxxxxxxx> wrote:
>
> On Wed, Jan 03, 2024 at 10:04:41AM -0800, Sean Christopherson wrote:
> >On Tue, Jan 02, 2024, Jim Mattson wrote:
> >> On Tue, Jan 2, 2024 at 3:24 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >> >
> >> > On Thu, Dec 21, 2023, Xu Yilun wrote:
> >> > > On Wed, Dec 20, 2023 at 08:28:06AM -0800, Sean Christopherson wrote:
> >> > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> >> > > > > index c57e181bba21..72634d6b61b2 100644
> >> > > > > --- a/arch/x86/kvm/mmu/mmu.c
> >> > > > > +++ b/arch/x86/kvm/mmu/mmu.c
> >> > > > > @@ -5177,6 +5177,13 @@ void __kvm_mmu_refresh_passthrough_bits(struct kvm_vcpu *vcpu,
> >> > > > >   reset_guest_paging_metadata(vcpu, mmu);
> >> > > > >  }
> >> > > > >
> >> > > > > +/* guest-physical-address bits limited by TDP */
> >> > > > > +unsigned int kvm_mmu_tdp_maxphyaddr(void)
> >> > > > > +{
> >> > > > > + return max_tdp_level == 5 ? 57 : 48;
> >> > > >
> >> > > > Using "57" is kinda sorta wrong, e.g. the SDM says:
> >> > > >
> >> > > >   Bits 56:52 of each guest-physical address are necessarily zero because
> >> > > >   guest-physical addresses are architecturally limited to 52 bits.
> >> > > >
> >> > > > Rather than split hairs over something that doesn't matter, I think it makes sense
> >> > > > for the CPUID code to consume max_tdp_level directly (I forgot that max_tdp_level
> >> > > > is still accurate when tdp_root_level is non-zero).
> >> > >
> >> > > It is still accurate for now. Only AMD SVM sets tdp_root_level the same as
> >> > > max_tdp_level:
> >> > >
> >> > >       kvm_configure_mmu(npt_enabled, get_npt_level(),
> >> > >                         get_npt_level(), PG_LEVEL_1G);
> >> > >
> >> > > But I wanna doulbe confirm if directly using max_tdp_level is fully
> >> > > considered.  In your last proposal, it is:
> >> > >
> >> > >   u8 kvm_mmu_get_max_tdp_level(void)
> >> > >   {
> >> > >       return tdp_root_level ? tdp_root_level : max_tdp_level;
> >> > >   }
> >> > >
> >> > > and I think it makes more sense, because EPT setup follows the same
> >> > > rule.  If any future architechture sets tdp_root_level smaller than
> >> > > max_tdp_level, the issue will happen again.
> >> >
> >> > Setting tdp_root_level != max_tdp_level would be a blatant bug.  max_tdp_level
> >> > really means "max possible TDP level KVM can use".  If an exact TDP level is being
> >> > forced by tdp_root_level, then by definition it's also the max TDP level, because
> >> > it's the _only_ TDP level KVM supports.
> >>
> >> This is all just so broken and wrong. The only guest.MAXPHYADDR that
> >> can be supported under TDP is the host.MAXPHYADDR. If KVM claims to
> >> support a smaller guest.MAXPHYADDR, then KVM is obligated to intercept
> >> every #PF,
>
> in this case (i.e., to support 48-bit guest.MAXPHYADDR when CPU supports only
> 4-level EPT), KVM has no need to intercept #PF because accessing a GPA with
> RSVD bits 51-48 set leads to EPT violation.

At the completion of the page table walk, if there is a permission
fault, the data address should not be accessed, so there should not be
an EPT violation. Remember Meltdown?

> >> and to emulate the faulting instruction to see if the RSVD
> >> bit should be set in the error code. Hardware isn't going to do it.
>
> Note for EPT violation VM exits, the CPU stores the GPA that caused this exit
> in "guest-physical address" field of VMCS. so, it is not necessary to emulate
> the faulting instruction to determine if any RSVD bit is set.

There should not be an EPT violation in the case discussed.

> >> Since some page faults may occur in CPL3, this means that KVM has to
> >> be prepared to emulate any memory-accessing instruction. That's not
> >> practical.
>
> as said above, no need to intercept #PF for this specific case.

I disagree. See above.