On 10/07/20 17:48, Mohammed Gamal wrote: > The reason behind including this patch is unexpected behaviour we see > with NPT vmexit handling in AMD processor. > > With previous patch ("KVM: SVM: Add guest physical address check in > NPF/PF interception") we see the followning error multiple times in > the 'access' test in kvm-unit-tests: > > test pte.p pte.36 pde.p: FAIL: pte 2000021 expected 2000001 > Dump mapping: address: 0x123400000000 > ------L4: 24c3027 > ------L3: 24c4027 > ------L2: 24c5021 > ------L1: 1002000021 > > This shows that the PTE's accessed bit is apparently being set by > the CPU hardware before the NPF vmexit. This completely handled by > hardware and can not be fixed in software. > > This patch introduces a workaround. We add a boolean variable: > 'allow_smaller_maxphyaddr' > Which is set individually by VMX and SVM init routines. On VMX it's > always set to true, on SVM it's only set to true when NPT is not > enabled. > > We also add a new capability KVM_CAP_SMALLER_MAXPHYADDR which > allows userspace to query if the underlying architecture would > support GUEST_MAXPHYADDR < HOST_MAXPHYADDR and hence act accordingly > (e.g. qemu can decide if it would ignore the -cpu ..,phys-bits=X) > > CC: Tom Lendacky <thomas.lendacky@xxxxxxx> > CC: Babu Moger <babu.moger@xxxxxxx> > Signed-off-by: Mohammed Gamal <mgamal@xxxxxxxxxx> Slightly rewritten commit message: KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support This patch adds a new capability KVM_CAP_SMALLER_MAXPHYADDR which allows userspace to query if the underlying architecture would support GUEST_MAXPHYADDR < HOST_MAXPHYADDR and hence act accordingly (e.g. qemu can decide if it should warn for -cpu ..,phys-bits=X) The complications in this patch are due to unexpected (but documented) behaviour we see with NPF vmexit handling in AMD processor. If SVM is modified to add guest physical address checks in the NPF and guest #PF paths, we see the followning error multiple times in the 'access' test in kvm-unit-tests: test pte.p pte.36 pde.p: FAIL: pte 2000021 expected 2000001 Dump mapping: address: 0x123400000000 ------L4: 24c3027 ------L3: 24c4027 ------L2: 24c5021 ------L1: 1002000021 This is because the PTE's accessed bit is set by the CPU hardware before the NPF vmexit. This is handled completely by hardware and cannot be fixed in software. Therefore, availability of the new capability depends on a boolean variable allow_smaller_maxphyaddr which is set individually by VMX and SVM init routines. On VMX it's always set to true, on SVM it's only set to true when NPT is not enabled. CC: Tom Lendacky <thomas.lendacky@xxxxxxx> CC: Babu Moger <babu.moger@xxxxxxx> Signed-off-by: Mohammed Gamal <mgamal@xxxxxxxxxx> Message-Id: <20200710154811.418214-10-mgamal@xxxxxxxxxx> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> Paolo