On 4/12/2024 11:48 PM, Sean Christopherson wrote:
On Fri, Apr 12, 2024, Xiaoyao Li wrote:
On 4/10/2024 8:19 AM, Sean Christopherson wrote:
On Wed, 13 Mar 2024 13:58:41 +0100, Gerd Hoffmann wrote:
Use the GuestPhysBits field (EAX[23:16]) to communicate the max
addressable GPA to the guest. Typically this is identical to the max
effective GPA, except in case the CPU supports MAXPHYADDR > 48 but does
not support 5-level TDP.
See commit messages and source code comments for details.
[...]
Applied to kvm-x86 misc, with massaged changelogs to be more verbose when
describing the impact of each change, e.g. to call out that patch 2 isn't an
urgent fix because guest firmware can simply limit itself to using GPAs that
can be addressed with 4-level paging.
I also tagged patch 1 for stable@, as KVM-on-KVM will do the wrong thing when
patch 2 lands, i.e. KVM will incorrectly advertise the addressable MAXPHYADDR
as the raw/real MAXPHYADDR.
you mean old KVM on new KVM?
Yep.
As far as I see, it seems no harm. e.g., if the userspace and L0 KVM have
the new implementation. On Intel SRF platform, L1 KVM sees EAX[23:16]=48,
EAX[7:0]=52. And when L1 KVM is old, it reports EAX[7:0] = 48 to L1
userspace.
Yep.
right, 48 is not the raw/real MAXPHYADDR. But I think there is not statement
on KVM that CPUID.0x8000_0008.EAX[7:0] of KVM_GET_SUPPORTED_CPUID reports
the raw/real MAXPHYADDR.
If we go deep enough, it becomes a functional problem. It's not even _that_
ridiculous/contrived :-)
L1 KVM is still aware that the real MAXPHYADDR=52, and so there are no immediate
issues with reserved bits at that level.
But L1 userspace will unintentionally configure L2 with CPUID.0x8000_0008.EAX[7:0]=48,
and so L2 KVM will incorrectly think bits 51:48 are reserved. If both L0 and L1
are using TDP, neither L0 nor L1 will intercept #PF. And because L1 userspace
was told MAXPHYADDR=48, it won't know that KVM needs to be configured with
allow_smaller_maxphyaddr=true in order for the setup to function correctly.
In this case, a) L1 userspace was told by L1 KVM that MAXPHYADDR = 48
via KVM_GET_SUPPORTED_CPUID. But b) L1 userspace gets MAXPHYADDR = 52 by
executing CPUID itself.
So if L1 userspace decides to configure MAXPHYADDR to 48 for L2,
according to a). It is supposed to check if KVM is configured with
allow_smaller_maxphyaddr=y. Otherwise, it cannot expect it works
function correctly.
If L2 runs an L3, and does not use EPT, L2 will think it can generate a RSVD #PF
to accelerate emulated MMIO. The GPA with bits 51:48!=0 created by L2 generates
an EPT violation in L1. Because L1 doesn't have allow_smaller_maxphyaddr, L1
installs an EPT mapping for the wrong GPA (effectively drops bits 51:48), and
L3 hangs because L1 will keep doing nothing on the resulting EPT violation (L1
thinks there's already a valid mapping).
With patch 1 and the OVMF fixes backported, L1 KVM will enumerate MAXPHYADDR=52,
L1 userspace creates L2 with MAXPHYADDR=52, and L2 OVMF restricts its mappings to
bits 47:0.
At least, I think that's what will happen.