Re: [PATCH v3 11/11] KVM: nVMX: Wake L2 from HLT when nested posted-interrupt pending

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 27/12/2017 11:51, Liran Alon wrote:
> Maybe I misunderstand you, but I don't think this is true.
> complete_nested_posted_interrupt() relies on being called at a very
> specific call-site: Right before VMEntry and after interrupts are
> disabled. It works by issue a self-IPI to cause CPU to actually process
> the nested posted-interrupts.
>
> In addition, I don't see how we can utilize the fact that
> kvm_cpu_has_interrupt() calls apic_has_interrupt_for_ppr() as
> vmx.nested.pi_desc->pir should never be synced into L0 vLAPIC IRR.
> In contrast to vmx.pi_desc->pir which does after sync_pir_to_irr().

You're right, the callbacks must stay.  What confused me is the
reference to pi_test_on in vmx_cpu_has_nested_posted_interrupt.  While
your patches are an improvement, there is still some confusion on what
leads to injecting a nested posted interrupt, and on the split between
"deliver" and "complete":

- checking pi_test_on should only be done when consuming the L2 PI
descriptor (which in your case is done by the processor via the
self-IPI).  So you don't need to test it before sending the self-IPI.

- vmx->nested.pi_pending should not be set in
vmx_deliver_nested_posted_interrupt after patch 9.  pi_pending should
only be set by the nested PI handler, when it is invoked outside guest
mode.  It is then consumed before doing the self-IPI as in the above sketch.

- likewise, vmx_cpu_has_nested_posted_interrupt should only check
vmx->nested.pi_pending.  I don't think it should even check
is_guest_mode(vcpu) or vmx->nested.pi_desc; compare with the handling of
KVM_REQ_NMI, it doesn't test nmi_allowed.  This leads me to the next point.

- vmx_complete_nested_posted_interrupt *must* find a valid descriptor.
Even after a vmexit, because vmx_complete_nested_posted_interrupt is the
very first thing that runs and L1 has had no time to run a vmwrite.  So
the above could become:

	if (is_guest_mode(vcpu) && vmx->nested.pi_pending) {
		vmx->nested.pi_pending = false;
		apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
				    POSTED_INTR_NESTED_VECTOR);
	}

and then what about that is_guest_mode check?  When L1 sent the PI
notification, it thought that the destination VCPU was in guest mode.
Upon vmexit, L1 can expect to either see ON=0 or get the notification
interrupt itself.  So either we deliver the interrupt to L1, or we must
do L2 virtual interrupt delivery before L1 even runs.  Hence IMO the
is_guest_mode is wrong, which unfortunately throws a wrench somewhat in
the self-IPI idea quite a bit.

I agree with moving vmx_complete_nested_posted_interrupt out of
check_nested_events, and also with the simplification of
vmx_hwapic_irr_update.  However, in my opinion the self-IPI is actually
complicating things.  Once the handling of vmx->nested.pi_pending is
cleaned up (the nested PI interrupt handler in patch 9 is a great
improvement in that respect), we can move
vmx_complete_nested_posted_interrupt to a new KVM_REQ_NESTED_PI request
that replaces vmx->nested.pi_pending.  The GUEST_INTR_STATUS write can
be done in the vmcs12, and will then be copied by prepare_vmcs02 to the
vmcs02.

I may be wrong, but based on my experience cleaning up the bare-metal
APICv, I suspect that both the current code and this version are overly
complicated, and there is a nice and simple core waiting to be
extracted.  Let's fix the race condition in the simple way, and
separately focus on cleaning up pi_pending and everything that follows
from there.

Paolo



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux