>On 27/12/2017 16:15, Liran Alon wrote: >> I think I now follow what you mean regarding cleaning logic around >> pi_pending. This is how I understand it: >> >> 1. "vmx->nested.pi_pending" is a flag used to indicate "L1 has sent a >> vmcs12->posted_intr_nv IPI". That's it. >> >> 2. Currently code is a bit weird in the sense that instead of signal the >> pending IPI in virtual LAPIC IRR, we set it in a special variable. >> If we would have set it in virtual LAPIC IRR, we could in theory behave >> very similar to a standard CPU. At interrupt injection point, we could: >> (a) If vCPU is in root-mode: Just inject the pending interrupt normally. >> (b) If vCPU is in non-root-mode and posted-interrupts feature is active, >> then instead of injecting the pending interrupt, we should simulate >> processing of posted-interrupts. >> >> 3. The processing of the nested posted-interrupts itself can still be >> done in self-IPI mechanism. >> >> 4. Because not doing (2), there is still currently an issue that L1 >> doesn't receive a vmcs12->posted_intr_nv interrupt when target vCPU >> thread has exited from L2 to L1 and pi_pending=true. >> >> Do we agree on the above? Or am I still misunderstanding something? > > Yes, I think we agree. > > Paolo Digging up this old thread to add discussions I had with Jim and Sean recently. We were looking at what was necessary in order to implement the suggestions above (route the nested PINV through L1 IRR instead of using the pi_pending bit), but now believe this change could never work perfectly. The pi_pending bit works rather well as it is only a hint to KVM that it may owe the guest a posted-interrupt completion. However, if we were to set the guest's nested PINV as pending in the L1 IRR it'd be challenging to infer whether or not it should actually be injected in L1 or result in posted-interrupt processing for L2. A possible solution would be to install a real ISR in L0 for KVM's nested PINV in concert with a per-CPU data structure containing a linked-list of possible vCPUs that could've been meant to receive the doorbell. But how would L0 know to remove a vCPU from this list? Since posted interrupts is exitless, KVM never actually knows if a vCPU got the intended doorbell. Short of any brilliant ideas, it seems that the pi_pending bit is probably here to say. I have a patch to serialize it in the {GET,SET}_NESTED_STATE ioctls (live migration is busted for nested posted interrupts, currently) but wanted to make sure we all agree pi_pending isn't going anywhere. -- Thanks, Oliver