The ABI is broken and we cannot support it properly. Turn it off. If this causes a meaningful performance regression for someone, KVM can introduce an improved ABI that is supportable. Cc: stable@xxxxxxxxxxxxxxx Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx> --- arch/x86/kernel/kvm.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 93ab0cbd304e..e6f2aefa298b 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -318,11 +318,26 @@ static void kvm_guest_cpu_init(void) pa = slow_virt_to_phys(this_cpu_ptr(&apf_reason)); -#ifdef CONFIG_PREEMPTION - pa |= KVM_ASYNC_PF_SEND_ALWAYS; -#endif pa |= KVM_ASYNC_PF_ENABLED; + /* + * We do not set KVM_ASYNC_PF_SEND_ALWAYS. With the current + * KVM paravirt ABI, the following scenario is possible: + * + * #PF: async page fault (KVM_PV_REASON_PAGE_NOT_PRESENT) + * NMI before CR2 or KVM_PF_REASON_PAGE_NOT_PRESENT + * NMI accesses user memory, e.g. due to perf + * #PF: normal page fault + * #PF reads CR2 and apf_reason -- apf_reason should be 0 + * + * outer #PF reads CR2 and apf_reason -- apf_reason should be + * KVM_PV_REASON_PAGE_NOT_PRESENT + * + * There is no possible way that both reads of CR2 and + * apf_reason get the correct values. Fixing this would + * require paravirt ABI changes. + */ + if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_VMEXIT)) pa |= KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT; -- 2.24.1