2017-06-17 13:52+0800, Wanpeng Li: > 2017-06-16 23:38 GMT+08:00 Radim Krčmář <rkrcmar@xxxxxxxxxx>: > > 2017-06-16 22:24+0800, Wanpeng Li: > >> 2017-06-16 21:37 GMT+08:00 Radim Krčmář <rkrcmar@xxxxxxxxxx>: > >> > 2017-06-14 19:26-0700, Wanpeng Li: > >> >> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> > >> >> > >> >> Add an async_page_fault field to vcpu->arch.exception to identify an async > >> >> page fault, and constructs the expected vm-exit information fields. Force > >> >> a nested VM exit from nested_vmx_check_exception() if the injected #PF > >> >> is async page fault. > >> >> > >> >> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > >> >> Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx> > >> >> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> > >> >> --- > >> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > >> >> @@ -452,7 +452,11 @@ EXPORT_SYMBOL_GPL(kvm_complete_insn_gp); > >> >> void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault) > >> >> { > >> >> ++vcpu->stat.pf_guest; > >> >> - vcpu->arch.cr2 = fault->address; > >> >> + vcpu->arch.exception.async_page_fault = fault->async_page_fault; > >> > > >> > I think we need to act as if arch.exception.async_page_fault was not > >> > pending in kvm_vcpu_ioctl_x86_get_vcpu_events(). Otherwise, if we > >> > migrate with pending async_page_fault exception, we'd inject it as a > >> > normal #PF, which could confuse/kill the nested guest. > >> > > >> > And kvm_vcpu_ioctl_x86_set_vcpu_events() should clean the flag for > >> > sanity as well. > >> > >> Do you mean we should add a field like async_page_fault to > >> kvm_vcpu_events::exception, then saves arch.exception.async_page_fault > >> to events->exception.async_page_fault through KVM_GET_VCPU_EVENTS and > >> restores events->exception.async_page_fault to > >> arch.exception.async_page_fault through KVM_SET_VCPU_EVENTS? > > > > No, I thought we could get away with a disgusting hack of hiding the > > exception from userspace, which would work for migration, but not if > > local userspace did KVM_GET_VCPU_EVENTS and KVM_SET_VCPU_EVENTS ... > > > > Extending the userspace interface would work, but I'd do it as a last > > resort, after all conservative solutions have failed. > > async_pf migration is very crude, so exposing the exception is just an > > ugly workaround for the local case. Adding the flag would also require > > userspace configuration of async_pf features for the guest to keep > > compatibility. > > > > I see two options that might be simpler than adding the userspace flag: > > > > 1) do the nested VM exit sooner, at the place where we now queue #PF, > > 2) queue the #PF later, save the async_pf in some intermediate > > structure and consume it at the place where you proposed the nested > > VM exit. > > How about something like this to not get exception events if it is > "is_guest_mode(vcpu) && vcpu->arch.exception.nr == PF_VECTOR && > vcpu->arch.exception.async_page_fault" since lost a reschedule > optimization is not that importmant in L1. > > @@ -3072,13 +3074,16 @@ static void > kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, > struct kvm_vcpu_events *events) > { > process_nmi(vcpu); > - events->exception.injected = > - vcpu->arch.exception.pending && > - !kvm_exception_is_soft(vcpu->arch.exception.nr); > - events->exception.nr = vcpu->arch.exception.nr; > - events->exception.has_error_code = vcpu->arch.exception.has_error_code; > - events->exception.pad = 0; > - events->exception.error_code = vcpu->arch.exception.error_code; > + if (!(is_guest_mode(vcpu) && vcpu->arch.exception.nr == PF_VECTOR && > + vcpu->arch.exception.async_page_fault)) { > + events->exception.injected = > + vcpu->arch.exception.pending && > + !kvm_exception_is_soft(vcpu->arch.exception.nr); > + events->exception.nr = vcpu->arch.exception.nr; > + events->exception.has_error_code = vcpu->arch.exception.has_error_code; > + events->exception.pad = 0; > + events->exception.error_code = vcpu->arch.exception.error_code; > + } This adds a bug when userspace does KVM_GET_VCPU_EVENTS and KVM_SET_VCPU_EVENTS without migration -- KVM would drop the async_pf and a L1 process gets stuck as a result. We we'd need to add a similar condition to kvm_vcpu_ioctl_x86_set_vcpu_events(), so userspace SET doesn't drop it, but that is far beyond the realm of acceptable code. I realized this bug only after the first mail, sorry for the confusing paragraph.