On Thu, 08 Mar 2018 16:02:42 +0000, Christoffer Dall wrote: > > On Thu, Mar 08, 2018 at 10:19:49AM +0000, Marc Zyngier wrote: > > On 07/03/18 23:34, Christoffer Dall wrote: > > > On Wed, Mar 7, 2018 at 12:40 PM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote: > > >> The vgic code is trying to be clever when injecting GICv2 SGIs, > > >> and will happily populate LRs with the same interrupt number if > > >> they come from multiple vcpus (after all, they are distinct > > >> interrupt sources). > > >> > > >> Unfortunately, this is against the letter of the architecture, > > >> and the GICv2 architecture spec says "Each valid interrupt stored > > >> in the List registers must have a unique VirtualID for that > > >> virtual CPU interface.". GICv3 has similar (although slightly > > >> ambiguous) restrictions. > > >> > > >> This results in guests locking up when using GICv2-on-GICv3, for > > >> example. The obvious fix is to stop trying so hard, and inject > > >> a single vcpu per SGI per guest entry. After all, pending SGIs > > >> with multiple source vcpus are pretty rare, and are mostly seen > > >> in scenario where the physical CPUs are severely overcomitted. > > >> > > >> Cc: stable@xxxxxxxxxxxxxxx > > >> Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework") > > >> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx> > > >> --- > > >> virt/kvm/arm/vgic/vgic.c | 11 +---------- > > >> 1 file changed, 1 insertion(+), 10 deletions(-) > > >> > > >> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c > > >> index c7c5ef190afa..1f7ff175f47b 100644 > > >> --- a/virt/kvm/arm/vgic/vgic.c > > >> +++ b/virt/kvm/arm/vgic/vgic.c > > >> @@ -720,18 +720,9 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu) > > >> list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) { > > >> spin_lock(&irq->irq_lock); > > >> > > >> - if (unlikely(vgic_target_oracle(irq) != vcpu)) > > >> - goto next; > > >> - > > >> - /* > > >> - * If we get an SGI with multiple sources, try to get > > >> - * them in all at once. > > >> - */ > > >> - do { > > >> + if (likely(vgic_target_oracle(irq) == vcpu)) > > >> vgic_populate_lr(vcpu, irq, count++); > > > > > > I think we need to change vgic_populate_lr to set the EOI maintenance > > > interrupt flag so that when the interrupt is deactivated, if there are > > > additional pending sources, we exit the guest and pick up the > > > interrupt. > > > > Potentially. We need to be careful about about the semantics of EOI MI > > with non-level interrupts (see the other thread about EOI signalling). > > I'll have a look. > > > > > > An alternative would be to set the underflow interrupt, but I don't > > > think that would be correct for multiple priorities, because the SGI > > > could have a higher priority than other pending interrupts we put in > > > the LR. > > > > I don't think priorities are the issue (after all, we already sort the > > irqs in order of priority). > > Yes, but assume you have three pending interrupts, one SGI from two > sources, and one SPI, and assume that the SGI has priority 1 and SPI > priority 2 (lower means higher priority), then I think with underflow or > the no-pending interrupt flag, we'll deliver the SGI from the first > source, and then the SPI, and then exiting to pick up the last SGI from > the other source. That's not how I understand the GIC architecture is > supposed to work. Am I missing something? No, you do have a point here. I'm so glad this is a v2 only thing. > > > My worry is that underflow is allowed to > > fire if there is one interrupt pending, which implies that you could > > end-up in a livelock scenario if you only have one SGI pending with > > multiple sources. > > Yes, doesn't work, so I think it should be a maintenance interrupt on > EOI. > > > > > Another possibility would be to use ICH_HCR_EL2.NPIE (GICH_HCR.NPIE on > > GICv2), which delivers a a MI if no pending interrupts are present. Once > > the SGI has been activated, we're guaranteed to be able to inject a new > > pending one. > > > > I like the latter, because it doesn't overload the rest of the code with > > new semantics. Thoughts? > > > > I'm fine with that if I can be proven wrong about the multiple sources > and priorities. I'll have a look at respining this with MI on EOI. Thanks, M. -- Jazz is not dead, it just smell funny.