Re: [PATCH v3 11/11] KVM: nVMX: Wake L2 from HLT when nested posted-interrupt pending

Oliver Upton <oupton@xxxxxxxxxx> · Mon, 23 Nov 2020 19:22:24 +0000

>On 27/12/2017 16:15, Liran Alon wrote:
>> I think I now follow what you mean regarding cleaning logic around
>> pi_pending. This is how I understand it:
>> 
>> 1. "vmx->nested.pi_pending" is a flag used to indicate "L1 has sent a
>> vmcs12->posted_intr_nv IPI". That's it.
>> 
>> 2. Currently code is a bit weird in the sense that instead of signal the
>> pending IPI in virtual LAPIC IRR, we set it in a special variable.
>> If we would have set it in virtual LAPIC IRR, we could in theory behave
>> very similar to a standard CPU. At interrupt injection point, we could:
>> (a) If vCPU is in root-mode: Just inject the pending interrupt normally.
>> (b) If vCPU is in non-root-mode and posted-interrupts feature is active,
>> then instead of injecting the pending interrupt, we should simulate
>> processing of posted-interrupts.
>> 
>> 3. The processing of the nested posted-interrupts itself can still be
>> done in self-IPI mechanism.
>> 
>> 4. Because not doing (2), there is still currently an issue that L1
>> doesn't receive a vmcs12->posted_intr_nv interrupt when target vCPU
>> thread has exited from L2 to L1 and pi_pending=true.
>> 
>> Do we agree on the above? Or am I still misunderstanding something?
>
> Yes, I think we agree.
>
> Paolo

Digging up this old thread to add discussions I had with Jim and Sean
recently.

We were looking at what was necessary in order to implement the
suggestions above (route the nested PINV through L1 IRR instead of using
the pi_pending bit), but now believe this change could never work
perfectly.

The pi_pending bit works rather well as it is only a hint to KVM that it
may owe the guest a posted-interrupt completion. However, if we were to
set the guest's nested PINV as pending in the L1 IRR it'd be challenging
to infer whether or not it should actually be injected in L1 or result
in posted-interrupt processing for L2.

A possible solution would be to install a real ISR in L0 for KVM's
nested PINV in concert with a per-CPU data structure containing a
linked-list of possible vCPUs that could've been meant to receive the
doorbell. But how would L0 know to remove a vCPU from this list? Since
posted interrupts is exitless, KVM never actually knows if a vCPU got
the intended doorbell.

Short of any brilliant ideas, it seems that the pi_pending bit
is probably here to say. I have a patch to serialize it in the
{GET,SET}_NESTED_STATE ioctls (live migration is busted for nested
posted interrupts, currently) but wanted to make sure we all agree
pi_pending isn't going anywhere.

--
Thanks,
Oliver