2017-09-14 03:54-0700, Wanpeng Li: > From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> > > qemu-system-x86-8600 [004] d..1 7205.687530: kvm_entry: vcpu 2 > qemu-system-x86-8600 [004] .... 7205.687532: kvm_exit: reason EXCEPTION_NMI rip 0xffffffffa921297d info ffffeb2c0e44e018 80000b0e > qemu-system-x86-8600 [004] .... 7205.687532: kvm_page_fault: address ffffeb2c0e44e018 error_code 0 > qemu-system-x86-8600 [004] .... 7205.687620: kvm_try_async_get_page: gva = 0xffffeb2c0e44e018, gfn = 0x427e4e > qemu-system-x86-8600 [004] .N.. 7205.687628: kvm_async_pf_not_present: token 0x8b002 gva 0xffffeb2c0e44e018 > kworker/4:2-7814 [004] .... 7205.687655: kvm_async_pf_completed: gva 0xffffeb2c0e44e018 address 0x7fcc30c4e000 > qemu-system-x86-8600 [004] .... 7205.687703: kvm_async_pf_ready: token 0x8b002 gva 0xffffeb2c0e44e018 > qemu-system-x86-8600 [004] d..1 7205.687711: kvm_entry: vcpu 2 > > After running some memory intensive workload in guest, I catch the kworker > which completes the GUP too quickly, and queues an "Page Ready" #PF exception > after the "Page not Present" exception before the next vmentry as the above > trace which will result in #DF injected to guest. The #DF feature can bite us in other cases as well, e.g. when emulating an instruction that throws #GP/#UD. Can't we replace all non-#PF exceptions with the PV #PF? Doing so should be wrong only for trap exceptions and we currently just override them anyway, so we wouldn't regress. :) > This patch fixes it by clearing the queue for "Page not Present" if "Page Ready" > occurs before the next vmentry since the GUP has already got the required page > and shadow page table has already been fixed by "Page Ready" handler. > > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx> > Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> > --- > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > @@ -8653,15 +8661,26 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu, > kvm_del_async_pf_gfn(vcpu, work->arch.gfn); > trace_kvm_async_pf_ready(work->arch.token, work->gva); > > - if ((vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED) && > - !apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) { > - fault.vector = PF_VECTOR; > - fault.error_code_valid = true; > - fault.error_code = 0; > - fault.nested_page_fault = false; > - fault.address = work->arch.token; > - fault.async_page_fault = true; > - kvm_inject_page_fault(vcpu, &fault); > + if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED) { > + if (!apf_get_user(vcpu, &val)) { I removed one indentation level when applying by merging these two condition. > + if (val == KVM_PV_REASON_PAGE_NOT_PRESENT && > + vcpu->arch.exception.pending && > + vcpu->arch.exception.nr == PF_VECTOR && > + !apf_put_user(vcpu, 0)) { > + vcpu->arch.exception.pending = false; We know that vcpu->arch.exception.injected is false here, but I cleared it too for safety, thanks. > + vcpu->arch.exception.nr = 0; > + vcpu->arch.exception.has_error_code = false; > + vcpu->arch.exception.error_code = 0; > + } else if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) { > + fault.vector = PF_VECTOR; > + fault.error_code_valid = true; > + fault.error_code = 0; > + fault.nested_page_fault = false; > + fault.address = work->arch.token; > + fault.async_page_fault = true; > + kvm_inject_page_fault(vcpu, &fault); > + } > + } > } > vcpu->arch.apf.halted = false; > vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; > -- > 2.7.4 >