2017-06-16 22:24+0800, Wanpeng Li: > 2017-06-16 21:37 GMT+08:00 Radim Krčmář <rkrcmar@xxxxxxxxxx>: > > 2017-06-14 19:26-0700, Wanpeng Li: > >> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> > >> > >> Add an async_page_fault field to vcpu->arch.exception to identify an async > >> page fault, and constructs the expected vm-exit information fields. Force > >> a nested VM exit from nested_vmx_check_exception() if the injected #PF > >> is async page fault. > >> > >> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > >> Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx> > >> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> > >> --- > >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > >> @@ -452,7 +452,11 @@ EXPORT_SYMBOL_GPL(kvm_complete_insn_gp); > >> void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault) > >> { > >> ++vcpu->stat.pf_guest; > >> - vcpu->arch.cr2 = fault->address; > >> + vcpu->arch.exception.async_page_fault = fault->async_page_fault; > > > > I think we need to act as if arch.exception.async_page_fault was not > > pending in kvm_vcpu_ioctl_x86_get_vcpu_events(). Otherwise, if we > > migrate with pending async_page_fault exception, we'd inject it as a > > normal #PF, which could confuse/kill the nested guest. > > > > And kvm_vcpu_ioctl_x86_set_vcpu_events() should clean the flag for > > sanity as well. > > Do you mean we should add a field like async_page_fault to > kvm_vcpu_events::exception, then saves arch.exception.async_page_fault > to events->exception.async_page_fault through KVM_GET_VCPU_EVENTS and > restores events->exception.async_page_fault to > arch.exception.async_page_fault through KVM_SET_VCPU_EVENTS? No, I thought we could get away with a disgusting hack of hiding the exception from userspace, which would work for migration, but not if local userspace did KVM_GET_VCPU_EVENTS and KVM_SET_VCPU_EVENTS ... Extending the userspace interface would work, but I'd do it as a last resort, after all conservative solutions have failed. async_pf migration is very crude, so exposing the exception is just an ugly workaround for the local case. Adding the flag would also require userspace configuration of async_pf features for the guest to keep compatibility. I see two options that might be simpler than adding the userspace flag: 1) do the nested VM exit sooner, at the place where we now queue #PF, 2) queue the #PF later, save the async_pf in some intermediate structure and consume it at the place where you proposed the nested VM exit.