On Fri, Apr 12, 2024, Xiaoyao Li wrote: > On 4/10/2024 8:19 AM, Sean Christopherson wrote: > > On Wed, 13 Mar 2024 13:58:41 +0100, Gerd Hoffmann wrote: > > > Use the GuestPhysBits field (EAX[23:16]) to communicate the max > > > addressable GPA to the guest. Typically this is identical to the max > > > effective GPA, except in case the CPU supports MAXPHYADDR > 48 but does > > > not support 5-level TDP. > > > > > > See commit messages and source code comments for details. > > > > > > [...] > > > > Applied to kvm-x86 misc, with massaged changelogs to be more verbose when > > describing the impact of each change, e.g. to call out that patch 2 isn't an > > urgent fix because guest firmware can simply limit itself to using GPAs that > > can be addressed with 4-level paging. > > > > I also tagged patch 1 for stable@, as KVM-on-KVM will do the wrong thing when > > patch 2 lands, i.e. KVM will incorrectly advertise the addressable MAXPHYADDR > > as the raw/real MAXPHYADDR. > > you mean old KVM on new KVM? Yep. > As far as I see, it seems no harm. e.g., if the userspace and L0 KVM have > the new implementation. On Intel SRF platform, L1 KVM sees EAX[23:16]=48, > EAX[7:0]=52. And when L1 KVM is old, it reports EAX[7:0] = 48 to L1 > userspace. Yep. > right, 48 is not the raw/real MAXPHYADDR. But I think there is not statement > on KVM that CPUID.0x8000_0008.EAX[7:0] of KVM_GET_SUPPORTED_CPUID reports > the raw/real MAXPHYADDR. If we go deep enough, it becomes a functional problem. It's not even _that_ ridiculous/contrived :-) L1 KVM is still aware that the real MAXPHYADDR=52, and so there are no immediate issues with reserved bits at that level. But L1 userspace will unintentionally configure L2 with CPUID.0x8000_0008.EAX[7:0]=48, and so L2 KVM will incorrectly think bits 51:48 are reserved. If both L0 and L1 are using TDP, neither L0 nor L1 will intercept #PF. And because L1 userspace was told MAXPHYADDR=48, it won't know that KVM needs to be configured with allow_smaller_maxphyaddr=true in order for the setup to function correctly. If L2 runs an L3, and does not use EPT, L2 will think it can generate a RSVD #PF to accelerate emulated MMIO. The GPA with bits 51:48!=0 created by L2 generates an EPT violation in L1. Because L1 doesn't have allow_smaller_maxphyaddr, L1 installs an EPT mapping for the wrong GPA (effectively drops bits 51:48), and L3 hangs because L1 will keep doing nothing on the resulting EPT violation (L1 thinks there's already a valid mapping). With patch 1 and the OVMF fixes backported, L1 KVM will enumerate MAXPHYADDR=52, L1 userspace creates L2 with MAXPHYADDR=52, and L2 OVMF restricts its mappings to bits 47:0. At least, I think that's what will happen.