On 6/19/20 10:39 AM, Mohammed Gamal wrote:
When EPT/NPT is enabled, KVM does not really look at guest physical address size. Address bits above maximum physical memory size are reserved. Because KVM does not look at these guest physical addresses, it currently effectively supports guest physical address sizes equal to the host. This can be problem when having a mixed setup of machines with 5-level page tables and machines with 4-level page tables, as live migration can change MAXPHYADDR while the guest runs, which can theoretically introduce bugs. In this patch series we add checks on guest physical addresses in EPT violation/misconfig and NPF vmexits and if needed inject the proper page faults in the guest. A more subtle issue is when the host MAXPHYADDR is larger than that of the guest. Page faults caused by reserved bits on the guest won't cause an EPT violation/NPF and hence we also check guest MAXPHYADDR and add PFERR_RSVD_MASK error code to the page fault if needed.
I'm probably missing something here, but I'm confused by this statement. Is this for a case where a page has been marked not present and the guest has also set what it believes are reserved bits? Then when the page is accessed, the guest sees a page fault without the error code for reserved bits? If so, my understanding is that is architecturally correct. P=0 is considered higher priority than other page faults, at least on AMD. So if you have a P=0 and other issues exist within the PTE, AMD will report the P=0 fault and that's it.
The priority of other page fault conditions when P=1 is not defined and I don't think we guarantee that you would get all error codes on fault. Software is always expected to address the page fault and retry, and it may get another page fault when it does, with a different error code. Assuming the other errors are addressed, eventually the reserved bits would cause an NPF and that could be detected by the HV and handled appropriately.
The last 3 patches (i.e. SVM bits and patch 11) are not intended for immediate inclusion and probably need more discussion. We've been noticing some unexpected behavior in handling NPF vmexits on AMD CPUs (see individual patches for details), and thus we are proposing a workaround (see last patch) that adds a capability that userspace can use to decide who to deal with hosts that might have issues supprting guest MAXPHYADDR < host MAXPHYADDR.
Also, something to consider. On AMD, when memory encryption is enabled (via the SYS_CFG MSR), a guest can actually have a larger MAXPHYADDR than the host. How do these patches all play into that?
Thanks, Tom
Mohammed Gamal (7): KVM: x86: Add helper functions for illegal GPA checking and page fault injection KVM: x86: mmu: Move translate_gpa() to mmu.c KVM: x86: mmu: Add guest physical address check in translate_gpa() KVM: VMX: Add guest physical address check in EPT violation and misconfig KVM: SVM: introduce svm_need_pf_intercept KVM: SVM: Add guest physical address check in NPF/PF interception KVM: x86: SVM: VMX: Make GUEST_MAXPHYADDR < HOST_MAXPHYADDR support configurable Paolo Bonzini (4): KVM: x86: rename update_bp_intercept to update_exception_bitmap KVM: x86: update exception bitmap on CPUID changes KVM: VMX: introduce vmx_need_pf_intercept KVM: VMX: optimize #PF injection when MAXPHYADDR does not match arch/x86/include/asm/kvm_host.h | 10 ++------ arch/x86/kvm/cpuid.c | 2 ++ arch/x86/kvm/mmu.h | 6 +++++ arch/x86/kvm/mmu/mmu.c | 12 +++++++++ arch/x86/kvm/svm/svm.c | 41 +++++++++++++++++++++++++++--- arch/x86/kvm/svm/svm.h | 6 +++++ arch/x86/kvm/vmx/nested.c | 28 ++++++++++++-------- arch/x86/kvm/vmx/vmx.c | 45 +++++++++++++++++++++++++++++---- arch/x86/kvm/vmx/vmx.h | 6 +++++ arch/x86/kvm/x86.c | 29 ++++++++++++++++++++- arch/x86/kvm/x86.h | 1 + include/uapi/linux/kvm.h | 1 + 12 files changed, 158 insertions(+), 29 deletions(-)