On Tue, Apr 16, 2024, Xiaoyao Li wrote: > On 4/15/2024 10:58 PM, Sean Christopherson wrote: > > On Mon, Apr 15, 2024, Xiaoyao Li wrote: > > > On 4/12/2024 11:48 PM, Sean Christopherson wrote: > > > > On Fri, Apr 12, 2024, Xiaoyao Li wrote: > > > > If we go deep enough, it becomes a functional problem. It's not even _that_ > > > > ridiculous/contrived :-) > > > > > > > > L1 KVM is still aware that the real MAXPHYADDR=52, and so there are no immediate > > > > issues with reserved bits at that level. > > > > > > > > But L1 userspace will unintentionally configure L2 with CPUID.0x8000_0008.EAX[7:0]=48, > > > > and so L2 KVM will incorrectly think bits 51:48 are reserved. If both L0 and L1 > > > > are using TDP, neither L0 nor L1 will intercept #PF. And because L1 userspace > > > > was told MAXPHYADDR=48, it won't know that KVM needs to be configured with > > > > allow_smaller_maxphyaddr=true in order for the setup to function correctly. > > > > > > In this case, a) L1 userspace was told by L1 KVM that MAXPHYADDR = 48 via > > > KVM_GET_SUPPORTED_CPUID. But b) L1 userspace gets MAXPHYADDR = 52 by > > > executing CPUID itself. > > > > KVM can't assume userspace will do raw CPUID. > > So the KVM ABI is that, KVM_GET_SUPPORTED_CPUID always reports the host's > MAXPHYADDR, Not precisely, because KVM will report a reduced value when something, e.g. MKTME, is stealing physical address bits and KVM is using shadow paging. I.e. when the host's effective address width is also the guest's effective address width. > if userspace wants to configure a smaller one than it for guest and expect it > functioning, it needs to set kvm_intel.allower_smaller_maxphyaddr ? Yep. The interaction with allow_smaller_maxphyaddr is what I want to get "right", in that I don't want KVM_GET_SUPPORTED_CPUID to report a MAXPHYADDR value that won't work for KVM's default configuration.