2017-06-20 05:47+0800, Wanpeng Li: > 2017-06-19 22:51 GMT+08:00 Radim Krčmář <rkrcmar@xxxxxxxxxx>: > > 2017-06-17 13:52+0800, Wanpeng Li: > >> 2017-06-16 23:38 GMT+08:00 Radim Krčmář <rkrcmar@xxxxxxxxxx>: > >> > 2017-06-16 22:24+0800, Wanpeng Li: > >> >> 2017-06-16 21:37 GMT+08:00 Radim Krčmář <rkrcmar@xxxxxxxxxx>: > >> >> > 2017-06-14 19:26-0700, Wanpeng Li: > >> >> >> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> > >> >> >> > >> >> >> Add an async_page_fault field to vcpu->arch.exception to identify an async > >> >> >> page fault, and constructs the expected vm-exit information fields. Force > >> >> >> a nested VM exit from nested_vmx_check_exception() if the injected #PF > >> >> >> is async page fault. > >> >> >> > >> >> >> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > >> >> >> Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx> > >> >> >> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> > >> >> >> --- > >> >> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > >> >> >> @@ -452,7 +452,11 @@ EXPORT_SYMBOL_GPL(kvm_complete_insn_gp); > >> >> >> void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault) > >> >> >> { > >> >> >> ++vcpu->stat.pf_guest; > >> >> >> - vcpu->arch.cr2 = fault->address; > >> >> >> + vcpu->arch.exception.async_page_fault = fault->async_page_fault; > >> >> > > >> >> > I think we need to act as if arch.exception.async_page_fault was not > >> >> > pending in kvm_vcpu_ioctl_x86_get_vcpu_events(). Otherwise, if we > >> >> > migrate with pending async_page_fault exception, we'd inject it as a > >> >> > normal #PF, which could confuse/kill the nested guest. > >> >> > > >> >> > And kvm_vcpu_ioctl_x86_set_vcpu_events() should clean the flag for > >> >> > sanity as well. > >> >> > >> >> Do you mean we should add a field like async_page_fault to > >> >> kvm_vcpu_events::exception, then saves arch.exception.async_page_fault > >> >> to events->exception.async_page_fault through KVM_GET_VCPU_EVENTS and > >> >> restores events->exception.async_page_fault to > >> >> arch.exception.async_page_fault through KVM_SET_VCPU_EVENTS? > >> > > >> > No, I thought we could get away with a disgusting hack of hiding the > >> > exception from userspace, which would work for migration, but not if > >> > local userspace did KVM_GET_VCPU_EVENTS and KVM_SET_VCPU_EVENTS ... > >> > > >> > Extending the userspace interface would work, but I'd do it as a last > >> > resort, after all conservative solutions have failed. > >> > async_pf migration is very crude, so exposing the exception is just an > >> > ugly workaround for the local case. Adding the flag would also require > >> > userspace configuration of async_pf features for the guest to keep > >> > compatibility. > >> > > >> > I see two options that might be simpler than adding the userspace flag: > >> > > >> > 1) do the nested VM exit sooner, at the place where we now queue #PF, > >> > 2) queue the #PF later, save the async_pf in some intermediate > >> > structure and consume it at the place where you proposed the nested > >> > VM exit. > >> > >> How about something like this to not get exception events if it is > >> "is_guest_mode(vcpu) && vcpu->arch.exception.nr == PF_VECTOR && > >> vcpu->arch.exception.async_page_fault" since lost a reschedule > >> optimization is not that importmant in L1. > >> > >> @@ -3072,13 +3074,16 @@ static void > >> kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu, > >> struct kvm_vcpu_events *events) > >> { > >> process_nmi(vcpu); > >> - events->exception.injected = > >> - vcpu->arch.exception.pending && > >> - !kvm_exception_is_soft(vcpu->arch.exception.nr); > >> - events->exception.nr = vcpu->arch.exception.nr; > >> - events->exception.has_error_code = vcpu->arch.exception.has_error_code; > >> - events->exception.pad = 0; > >> - events->exception.error_code = vcpu->arch.exception.error_code; > >> + if (!(is_guest_mode(vcpu) && vcpu->arch.exception.nr == PF_VECTOR && > >> + vcpu->arch.exception.async_page_fault)) { > >> + events->exception.injected = > >> + vcpu->arch.exception.pending && > >> + !kvm_exception_is_soft(vcpu->arch.exception.nr); > >> + events->exception.nr = vcpu->arch.exception.nr; > >> + events->exception.has_error_code = vcpu->arch.exception.has_error_code; > >> + events->exception.pad = 0; > >> + events->exception.error_code = vcpu->arch.exception.error_code; > >> + } > > > > This adds a bug when userspace does KVM_GET_VCPU_EVENTS and > > KVM_SET_VCPU_EVENTS without migration -- KVM would drop the async_pf and > > a L1 process gets stuck as a result. > > > > We we'd need to add a similar condition to > > kvm_vcpu_ioctl_x86_set_vcpu_events(), so userspace SET doesn't drop it, > > but that is far beyond the realm of acceptable code. > > Do you mean current status of the patchset v2 can be accepted? > Otherwise, what's the next should be done? No, sorry, that one has the migration bug (the async_page_fault gets dropped on destination). You proposed to add the flag to the userspace interface, which is a sound solution. I was asking to look for a different one, because the flag is a work-around for an implementation detail, which isn't a good thing to put into a userspace interface ... Still, I looked at the early VM exit (1) and it doesn't fit well into SVM's model of single nested VM exit location, so it's out. The remaining contender is to add a paravirtualized event for apf and only convert it into nested VM exit or #PF in inject_pending_event(). The end result would likely be a slightly better version of the exception flag ... I guess that doing a prototype of the userspace interface extension is a good follow up.