On Mon, Feb 26, 2024, Tao Su wrote: > On Mon, Feb 26, 2024 at 09:30:33AM +0800, Xiaoyao Li wrote: > > On 2/23/2024 9:35 AM, Sean Christopherson wrote: > > > On Tue, 09 Jan 2024 16:23:40 -0800, Sean Christopherson wrote: > > > > Add a VMX flag in /proc/cpuinfo, ept_5level, so that userspace can query > > > > whether or not the CPU supports 5-level EPT paging. EPT capabilities are > > > > enumerated via MSR, i.e. aren't accessible to userspace without help from > > > > the kernel, and knowing whether or not 5-level EPT is supported is sadly > > > > necessary for userspace to correctly configure KVM VMs. > > > > > > > > When EPT is enabled, bits 51:49 of guest physical addresses are consumed > > > > if and only if 5-level EPT is enabled. For CPUs with MAXPHYADDR > 48, KVM > > > > *can't* map all legal guest memory if 5-level EPT is unsupported, e.g. > > > > creating a VM with RAM (or anything that gets stuffed into KVM's memslots) > > > > above bit 48 will be completely broken. > > > > > > > > [...] > > > > > > Applied to kvm-x86 vmx, with a massaged changelog to avoid presenting this as a > > > bug fix (and finally fixed the 51:49=>51:48 goof): > > > > > > Add a VMX flag in /proc/cpuinfo, ept_5level, so that userspace can query > > > whether or not the CPU supports 5-level EPT paging. EPT capabilities are > > > enumerated via MSR, i.e. aren't accessible to userspace without help from > > > the kernel, and knowing whether or not 5-level EPT is supported is useful > > > for debug, triage, testing, etc. > > > For example, when EPT is enabled, bits 51:48 of guest physical addresses > > > are consumed by the CPU if and only if 5-level EPT is enabled. For CPUs > > > with MAXPHYADDR > 48, KVM *can't* map all legal guest memory if 5-level > > > EPT is unsupported, making it more or less necessary to know whether or > > > not 5-level EPT is supported. > > > > > > [1/1] x86/cpu: Add a VMX flag to enumerate 5-level EPT support to userspace > > > https://github.com/kvm-x86/linux/commit/b1a3c366cbc7 > > > > Do we need a new KVM CAP for this? This decides how to interact with old > > kernel without this patch. In that case, no ept_5level in /proc/cpuinfo, > > what should we do in the absence of ept_5level? treat it only 4 level EPT > > supported? > > Maybe also adding flag for 4-level EPT can be an option. If userspace > checks both 4-level and 5-level are not in /proc/cpuinfo, it can regard > the kernel as old. The intent is that this is informational only, not something that userspace can or should use to make decisions about how to configure KVM guests. As pointed out elsewhere in the thread, simply restricting guest.MAXPHYADDR to 48 doesn't actually create an architecturally viable VM. At the very least, KVM needs to be configured with allow_smaller_maxphyaddr=1, and aside from the gaping holes in KVM related to that knob, AIUI allow_smaller_maxphyaddr=1 isn't an option in this case due to other quirks/flaws with the CPU in question. I don't think there's been an on-list summary posted, but the plan is to figure out a way to inform guest firmware of the max _usable_ physical address, so that firmware doesn't create BARs and whatnot in memory that KVM can't map. And then have KVM relay the usuable guest.MAXPHYADDR to userspace. That way userspace doesn't need to infer the effective guest.MAXPHYADDR from EPT knobs.