Re: [PATCH v4 0/2] kvm/cpuid: set proper GuestPhysBits in CPUID.0x80000008

Sean Christopherson <seanjc@xxxxxxxxxx> · Fri, 12 Apr 2024 08:48:29 -0700

On Fri, Apr 12, 2024, Xiaoyao Li wrote:
> On 4/10/2024 8:19 AM, Sean Christopherson wrote:
> > On Wed, 13 Mar 2024 13:58:41 +0100, Gerd Hoffmann wrote:
> > > Use the GuestPhysBits field (EAX[23:16]) to communicate the max
> > > addressable GPA to the guest.  Typically this is identical to the max
> > > effective GPA, except in case the CPU supports MAXPHYADDR > 48 but does
> > > not support 5-level TDP.
> > > 
> > > See commit messages and source code comments for details.
> > > 
> > > [...]
> > 
> > Applied to kvm-x86 misc, with massaged changelogs to be more verbose when
> > describing the impact of each change, e.g. to call out that patch 2 isn't an
> > urgent fix because guest firmware can simply limit itself to using GPAs that
> > can be addressed with 4-level paging.
> > 
> > I also tagged patch 1 for stable@, as KVM-on-KVM will do the wrong thing when
> > patch 2 lands, i.e. KVM will incorrectly advertise the addressable MAXPHYADDR
> > as the raw/real MAXPHYADDR.
> 
> you mean old KVM on new KVM?

Yep.

> As far as I see, it seems no harm. e.g., if the userspace and L0 KVM have
> the new implementation. On Intel SRF platform, L1 KVM sees EAX[23:16]=48,
> EAX[7:0]=52. And when L1 KVM is old, it reports EAX[7:0] = 48 to L1
> userspace.

Yep.

> right, 48 is not the raw/real MAXPHYADDR. But I think there is not statement
> on KVM that CPUID.0x8000_0008.EAX[7:0] of KVM_GET_SUPPORTED_CPUID reports
> the raw/real MAXPHYADDR.

If we go deep enough, it becomes a functional problem.  It's not even _that_
ridiculous/contrived :-)

L1 KVM is still aware that the real MAXPHYADDR=52, and so there are no immediate
issues with reserved bits at that level.

But L1 userspace will unintentionally configure L2 with CPUID.0x8000_0008.EAX[7:0]=48,
and so L2 KVM will incorrectly think bits 51:48 are reserved.  If both L0 and L1
are using TDP, neither L0 nor L1 will intercept #PF.  And because L1 userspace
was told MAXPHYADDR=48, it won't know that KVM needs to be configured with
allow_smaller_maxphyaddr=true in order for the setup to function correctly.

If L2 runs an L3, and does not use EPT, L2 will think it can generate a RSVD #PF
to accelerate emulated MMIO.  The GPA with bits 51:48!=0 created by L2 generates
an EPT violation in L1.  Because L1 doesn't have allow_smaller_maxphyaddr, L1
installs an EPT mapping for the wrong GPA (effectively drops bits 51:48), and
L3 hangs because L1 will keep doing nothing on the resulting EPT violation (L1
thinks there's already a valid mapping).

With patch 1 and the OVMF fixes backported, L1 KVM will enumerate MAXPHYADDR=52,
L1 userspace creates L2 with MAXPHYADDR=52, and L2 OVMF restricts its mappings to
bits 47:0.

At least, I think that's what will happen.