Re: [PATCH 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf

Radim Krčmář <rkrcmar@xxxxxxxxxx> · Wed, 14 Jun 2017 18:18:27 +0200

2017-06-14 22:32+0800, Wanpeng Li:
> 2017-06-14 21:20 GMT+08:00 Radim Krčmář <rkrcmar@xxxxxxxxxx>:
> > 2017-06-14 21:02+0800, Wanpeng Li:
> >> 2017-06-14 20:52 GMT+08:00 Radim Krčmář <rkrcmar@xxxxxxxxxx>:
> >> > 2017-06-14 09:07+0800, Wanpeng Li:
> >> >> 2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@xxxxxxxxxx>:
> >> >> > Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
> >> >> > exits;  isn't this going to change the CR2 visible in L2 guest after a
> >> >> > nested VM entry?
> >> >>
> >> >> Sorry, I don't fully understand the question. As you know this
> >> >> vcpu->arch.cr2 which includes token is set before async pf injection,
> >> >
> >> > Yes, I'm thinking that setting vcpu->arch.cr2 is a mistake in this case.
> >> >
> >> >> and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,
> >> >
> >> > Right, so we do not need to have the token in CR2, because L1 is not
> >> > going to look at it.
> >> >
> >> >> why it can change the CR2 visible in L2 guest after a nested VM entry?
> >> >
> >> > Sorry, the situation is too convoluted to be expressed in one sentence:
> >> >
> >> > 1) L2 is running with CR2 = L2CR2
> >> > 3) VMX exits (say, unrelated EXTERNAL_INTERRUPT) and L0 stores L2CR2 in
> >> >    vcpu->arch.cr2
> >> > 2) APF for L1 has completed
> >> > 4) L0 KVM wants to inject APF and sets vcpu->arch.cr2 = APFT
> >> > 5) L0 KVM does a nested VM exit to L1, EXIT_QUALIFICATION = APFT
> >> > 6) L0 KVM enters L1 with CR2 = vcpu->arch.cr2 = APFT
> >> > 7) L1 stores APFT as L2's CR2
> >> > 8) L1 handles APF, maybe reschedules, but eventually comes back to this
> >> >    L2's thread
> >> > 9) after some time, L1 enters L2 with CR2 = APFT
> >> > 10) L2 is running with CR2 = APTF
> >> >
> >> > The original L2CR2 is lost and we'd introduce a bug if L2 wanted to look
> >> > at it, e.g. it was in a process of handling its #PF.
> >>
> >> Good point. What's your proposal? :)
> >
> > Get rid of async_pf. :) Optimal solutions aside, I think it would be
> > best to add a new injection function for APF.  One that injects a normal
> > #PF for non-nested guests and directly triggers a #PF VM exit otherwise,
> > and call it from kvm_arch_async_page_*present().
> 
> In addition, nested vmexit in kvm_arch_async_page_*present() directly
> instead of through inject_pending_event() before vmentry, or nested
> vmexit after vmexit on L0 looks strange.

Right, it might be tricky if another exception can get queued in
between.  (Which shouldn't happen, though, because async_pf exceptions
must not cause double faults for no good reason.)

>                                          So how about the proposal of
> the nested_apf_token stuff? Radim, Paolo?

I think it is worth exploring.  We need to make sure that interfacing
with userspace through kvm_vcpu_ioctl_x86_{set,get}_vcpu_events() is
right, but it should be possible without any extension as migration is
already covered by unconditional async_pf wakeup on the destination.

At this point, using a structure other than arch.exception would make
sense too -- async_pf uses the exception injection path mostly for
convenience, but the paravirt exception does not want to mix with real
exceptions.