On Mon, Nov 23, 2020 at 4:10 PM Oliver Upton <oupton@xxxxxxxxxx> wrote: > > On Mon, Nov 23, 2020 at 2:42 PM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > > > On 23/11/20 20:22, Oliver Upton wrote: > > > The pi_pending bit works rather well as it is only a hint to KVM that it > > > may owe the guest a posted-interrupt completion. However, if we were to > > > set the guest's nested PINV as pending in the L1 IRR it'd be challenging > > > to infer whether or not it should actually be injected in L1 or result > > > in posted-interrupt processing for L2. > > > > Stupid question: why does it matter? The behavior when the PINV is > > delivered does not depend on the time it enters the IRR, only on the > > time that it enters ISR. If that happens while the vCPU while in L2, it > > would trigger posted interrupt processing; if PINV moves to ISR while in > > L1, it would be delivered normally as an interrupt. > > > > There are various special cases but they should fall in place. For > > example, if PINV is delivered during L1 vmentry (with IF=0), it would be > > delivered at the next inject_pending_event when the VMRUN vmexit is > > processed and interrupts are unmasked. > > > > The tricky case is when L0 tries to deliver the PINV to L1 as a posted > > interrupt, i.e. in vmx_deliver_nested_posted_interrupt. Then the > > > > if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) > > kvm_vcpu_kick(vcpu); > > > > needs a tweak to fall back to setting the PINV in L1's IRR: > > > > if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) { > > /* set PINV in L1's IRR */ > > kvm_vcpu_kick(vcpu); > > } > > Yeah, I think that's fair. Regardless, the pi_pending bit should've > only been set if the IPI was actually sent. Though I suppose Didn't finish my thought :-/ Though I suppose pi_pending was set unconditionally (and skipped the IRR) since until recently KVM completely bungled handling the PINV correctly when in the L1 IRR. > > > but you also have to do the same *in the PINV handler* > > sysvec_kvm_posted_intr_nested_ipi too, to handle the case where the > > L2->L0 vmexit races against sending the IPI. > > Indeed, there is a race but are we assured that the target vCPU thread > is scheduled on the target CPU when that IPI arrives? > > I believe there is a 1-to-many relationship here, which is why I said > each CPU would need to maintain a linked list of possible vCPUs to > iterate and find the intended recipient. The process of removing vCPUs > from the list where we caught the IPI in L0 is quite clear, but it > doesn't seem like we could ever know to remove vCPUs from the list > when hardware catches that IPI. > > If the ISR thing can be figured out then that'd be great, though it > seems infeasible because we are racing with scheduling on the target. > > Could we split the difference and do something like: > > if (kvm_vcpu_trigger_posted_interrupt(vcpu, true)) { > vmx->nested.pi_pending = true; > } else { > /* set PINV in L1's IRR */ > kvm_vcpu_kick(vcpu); > } > > which ensures we only set the hint when KVM might actually have > something to do. Otherwise, it'll deliver to L1 like a normal > interrupt or trigger posted-interrupt processing on nested VM-entry if > IF=0. > > > What am I missing? > > > > Paolo > >