On Mon, Jan 08, 2024, Tao Su wrote: > On Fri, Jan 05, 2024 at 12:26:08PM -0800, Sean Christopherson wrote: > > On Thu, Jan 04, 2024, Jim Mattson wrote: > > > On Thu, Jan 4, 2024 at 7:08 AM Chao Gao <chao.gao@xxxxxxxxx> wrote: > > > > > > > > On Wed, Jan 03, 2024 at 07:40:02PM -0800, Jim Mattson wrote: > > > > >On Wed, Jan 3, 2024 at 6:45 PM Chao Gao <chao.gao@xxxxxxxxx> wrote: > > > > >> > > > > >> On Wed, Jan 03, 2024 at 10:04:41AM -0800, Sean Christopherson wrote: > > > > >> >On Tue, Jan 02, 2024, Jim Mattson wrote: > > > > >> >> This is all just so broken and wrong. The only guest.MAXPHYADDR that > > > > >> >> can be supported under TDP is the host.MAXPHYADDR. If KVM claims to > > > > >> >> support a smaller guest.MAXPHYADDR, then KVM is obligated to intercept > > > > >> >> every #PF, > > > > >> > > > > >> in this case (i.e., to support 48-bit guest.MAXPHYADDR when CPU supports only > > > > >> 4-level EPT), KVM has no need to intercept #PF because accessing a GPA with > > > > >> RSVD bits 51-48 set leads to EPT violation. > > > > > > > > > >At the completion of the page table walk, if there is a permission > > > > >fault, the data address should not be accessed, so there should not be > > > > >an EPT violation. Remember Meltdown? > > > > > > > > You are right. I missed this case. KVM needs to intercept #PF to set RSVD bit > > > > in PFEC. > > > > > > I have no problem with a user deliberately choosing an unsupported > > > configuration, but I do have a problem with KVM_GET_SUPPORTED_CPUID > > > returning an unsupported configuration. > > > > +1 > > > > Advertising guest.MAXPHYADDR < host.MAXPHYADDR in KVM_GET_SUPPORTED_CPUID simply > > isn't viable when TDP is enabled. I suppose KVM do so when allow_smaller_maxphyaddr > > is enabled, but that's just asking for confusion, e.g. if userspace reflects the > > CPUID back into the guest, it could unknowingly create a VM that depends on > > allow_smaller_maxphyaddr. > > > > I think the least awful option is to have the kernel expose whether or not the > > CPU support 5-level EPT to userspace. That doesn't even require new uAPI per se, > > just a new flag in /proc/cpuinfo. It'll be a bit gross for userspace to parse, > > but it's not the end of the world. Alternatively, KVM could add a capability to > > enumerate the max *addressable* GPA, but userspace would still need to manually > > take action when KVM can't address all of memory, i.e. a capability would be less > > ugly, but wouldn't meaningfully change userspace's responsibilities. > > Yes, exposing whether the CPU support 5-level EPT is indeed a better solution, it > not only bypasses the KVM_GET_SUPPORTED_CPUID but also provides the information to > userspace. > > I think a new KVM capability to enumerate the max GPA is better since it will be > more convenient if userspace wants to use, e.g., automatically limit physical bits > or the GPA in the user memory region. Not really, because "automatically" limiting guest.MAXPHYADDR without setting allow_smaller_maxphyaddr isn't exactly safe. I think it's also useful to capture *why* KVM's max addressable GPA is smaller than host.MAXPHYADDR, e.g. if down the road someone ships a CPU that actually does the right thing when guest.MAXPHYADDR is smaller than host.MAXPHYADDR. > But only reporting this capability can’t solve the KVM hang issue, userspace can > choose to ignore the max GPA, e.g., six selftests in changelog are still failed. I know. I just have more pressing concerns than making selftests work on funky hardware that AFAIK isn't publicly available. > I think we can reconsider patch2 if we don’t advertise > guest.MAXPHYADDR < host.MAXPHYADDR in KVM_GET_SUPPORTED_CPUID. Nah, someone just needs to update the selftests if/when the ept_5level flag lands.