Re: [PATCH v2] KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously

Radim Krčmář <rkrcmar@xxxxxxxxxx> · Thu, 14 Sep 2017 18:52:36 +0200

2017-09-14 03:54-0700, Wanpeng Li:
> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
> 
> qemu-system-x86-8600  [004] d..1  7205.687530: kvm_entry: vcpu 2
> qemu-system-x86-8600  [004] ....  7205.687532: kvm_exit: reason EXCEPTION_NMI rip 0xffffffffa921297d info ffffeb2c0e44e018 80000b0e
> qemu-system-x86-8600  [004] ....  7205.687532: kvm_page_fault: address ffffeb2c0e44e018 error_code 0
> qemu-system-x86-8600  [004] ....  7205.687620: kvm_try_async_get_page: gva = 0xffffeb2c0e44e018, gfn = 0x427e4e
> qemu-system-x86-8600  [004] .N..  7205.687628: kvm_async_pf_not_present: token 0x8b002 gva 0xffffeb2c0e44e018
>     kworker/4:2-7814  [004] ....  7205.687655: kvm_async_pf_completed: gva 0xffffeb2c0e44e018 address 0x7fcc30c4e000
> qemu-system-x86-8600  [004] ....  7205.687703: kvm_async_pf_ready: token 0x8b002 gva 0xffffeb2c0e44e018
> qemu-system-x86-8600  [004] d..1  7205.687711: kvm_entry: vcpu 2
> 
> After running some memory intensive workload in guest, I catch the kworker 
> which completes the GUP too quickly, and queues an "Page Ready" #PF exception 
> after the "Page not Present" exception before the next vmentry as the above 
> trace which will result in #DF injected to guest.

The #DF feature can bite us in other cases as well, e.g. when emulating
an instruction that throws #GP/#UD.

Can't we replace all non-#PF exceptions with the PV #PF?
Doing so should be wrong only for trap exceptions and we currently just
override them anyway, so we wouldn't regress. :)

> This patch fixes it by clearing the queue for "Page not Present" if "Page Ready"
> occurs before the next vmentry since the GUP has already got the required page 
> and shadow page table has already been fixed by "Page Ready" handler.
> 
> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx>
> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
> ---
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> @@ -8653,15 +8661,26 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>  		kvm_del_async_pf_gfn(vcpu, work->arch.gfn);
>  	trace_kvm_async_pf_ready(work->arch.token, work->gva);
>  
> -	if ((vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED) &&
> -	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
> -		fault.vector = PF_VECTOR;
> -		fault.error_code_valid = true;
> -		fault.error_code = 0;
> -		fault.nested_page_fault = false;
> -		fault.address = work->arch.token;
> -		fault.async_page_fault = true;
> -		kvm_inject_page_fault(vcpu, &fault);
> +	if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED) {
> +		if (!apf_get_user(vcpu, &val)) {

I removed one indentation level when applying by merging these two
condition.

> +			if (val == KVM_PV_REASON_PAGE_NOT_PRESENT &&
> +				vcpu->arch.exception.pending &&
> +				vcpu->arch.exception.nr == PF_VECTOR &&
> +				!apf_put_user(vcpu, 0)) {
> +				vcpu->arch.exception.pending = false;

We know that vcpu->arch.exception.injected is false here, but I cleared
it too for safety, thanks.

> +				vcpu->arch.exception.nr = 0;
> +				vcpu->arch.exception.has_error_code = false;
> +				vcpu->arch.exception.error_code = 0;
> +			} else if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
> +				fault.vector = PF_VECTOR;
> +				fault.error_code_valid = true;
> +				fault.error_code = 0;
> +				fault.nested_page_fault = false;
> +				fault.address = work->arch.token;
> +				fault.async_page_fault = true;
> +				kvm_inject_page_fault(vcpu, &fault);
> +			}
> +		}
>  	}
>  	vcpu->arch.apf.halted = false;
>  	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> -- 
> 2.7.4
>