Re: [PATCH v4 0/2] kvm/cpuid: set proper GuestPhysBits in CPUID.0x80000008

Xiaoyao Li <xiaoyao.li@xxxxxxxxx> · Mon, 15 Apr 2024 14:17:18 +0800

On 4/12/2024 11:48 PM, Sean Christopherson wrote:
On Fri, Apr 12, 2024, Xiaoyao Li wrote:
On 4/10/2024 8:19 AM, Sean Christopherson wrote:
On Wed, 13 Mar 2024 13:58:41 +0100, Gerd Hoffmann wrote:
Use the GuestPhysBits field (EAX[23:16]) to communicate the max
addressable GPA to the guest.  Typically this is identical to the max
effective GPA, except in case the CPU supports MAXPHYADDR > 48 but does
not support 5-level TDP.

See commit messages and source code comments for details.

[...]

Applied to kvm-x86 misc, with massaged changelogs to be more verbose when
describing the impact of each change, e.g. to call out that patch 2 isn't an
urgent fix because guest firmware can simply limit itself to using GPAs that
can be addressed with 4-level paging.

I also tagged patch 1 for stable@, as KVM-on-KVM will do the wrong thing when
patch 2 lands, i.e. KVM will incorrectly advertise the addressable MAXPHYADDR
as the raw/real MAXPHYADDR.

you mean old KVM on new KVM?

Yep.

As far as I see, it seems no harm. e.g., if the userspace and L0 KVM have
the new implementation. On Intel SRF platform, L1 KVM sees EAX[23:16]=48,
EAX[7:0]=52. And when L1 KVM is old, it reports EAX[7:0] = 48 to L1
userspace.

Yep.

right, 48 is not the raw/real MAXPHYADDR. But I think there is not statement
on KVM that CPUID.0x8000_0008.EAX[7:0] of KVM_GET_SUPPORTED_CPUID reports
the raw/real MAXPHYADDR.

If we go deep enough, it becomes a functional problem.  It's not even _that_
ridiculous/contrived :-)

L1 KVM is still aware that the real MAXPHYADDR=52, and so there are no immediate
issues with reserved bits at that level.

But L1 userspace will unintentionally configure L2 with CPUID.0x8000_0008.EAX[7:0]=48,
and so L2 KVM will incorrectly think bits 51:48 are reserved.  If both L0 and L1
are using TDP, neither L0 nor L1 will intercept #PF.  And because L1 userspace
was told MAXPHYADDR=48, it won't know that KVM needs to be configured with
allow_smaller_maxphyaddr=true in order for the setup to function correctly.

In this case, a) L1 userspace was told by L1 KVM that MAXPHYADDR = 48 
via KVM_GET_SUPPORTED_CPUID. But b) L1 userspace gets MAXPHYADDR = 52 by 
executing CPUID itself.

So if L1 userspace decides to configure MAXPHYADDR to 48 for L2, 
according to a). It is supposed to check if KVM is configured with 
allow_smaller_maxphyaddr=y. Otherwise, it cannot expect it works 
function correctly.

If L2 runs an L3, and does not use EPT, L2 will think it can generate a RSVD #PF
to accelerate emulated MMIO.  The GPA with bits 51:48!=0 created by L2 generates
an EPT violation in L1.  Because L1 doesn't have allow_smaller_maxphyaddr, L1
installs an EPT mapping for the wrong GPA (effectively drops bits 51:48), and
L3 hangs because L1 will keep doing nothing on the resulting EPT violation (L1
thinks there's already a valid mapping).

With patch 1 and the OVMF fixes backported, L1 KVM will enumerate MAXPHYADDR=52,
L1 userspace creates L2 with MAXPHYADDR=52, and L2 OVMF restricts its mappings to
bits 47:0.

At least, I think that's what will happen.