Re: [PATCH 2/3] kvm: arm: VGIC: Scan all IRQs when interrupt group gets enabled

Andre Przywara <andre.przywara@xxxxxxx> · Tue, 12 Nov 2019 09:36:58 +0000

On Sun, 10 Nov 2019 14:29:14 +0000
Marc Zyngier <maz@xxxxxxxxxx> wrote:

Hi Marc,

> On Fri,  8 Nov 2019 17:49:51 +0000
> Andre Przywara <andre.przywara@xxxxxxx> wrote:
> 
> > Our current VGIC emulation code treats the "EnableGrpX" bits in GICD_CTLR
> > as a single global interrupt delivery switch, where in fact the GIC
> > architecture asks for this being separate for the two interrupt groups.
> > 
> > To implement this properly, we have to slightly adjust our design, to
> > *not* let IRQs from a disabled interrupt group be added to the ap_list.
> > 
> > As a consequence, enabling one group requires us to re-evaluate every
> > pending IRQ and potentially add it to its respective ap_list. Similarly
> > disabling an interrupt group requires pending IRQs to be removed from
> > the ap_list (as long as they have not been activated yet).
> > 
> > Implement a rather simple, yet not terribly efficient algorithm to
> > achieve this: For each VCPU we iterate over all IRQs, checking for
> > pending ones and adding them to the list. We hold the ap_list_lock
> > for this, to make this atomic from a VCPU's point of view.
> > 
> > When an interrupt group gets disabled, we can't directly remove affected
> > IRQs from the ap_list, as a running VCPU might have already activated
> > them, which wouldn't be immediately visible to the host.
> > Instead simply kick all VCPUs, so that they clean their ap_list's
> > automatically when running vgic_prune_ap_list().
> > 
> > Signed-off-by: Andre Przywara <andre.przywara@xxxxxxx>
> > ---
> >  virt/kvm/arm/vgic/vgic.c | 88 ++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 80 insertions(+), 8 deletions(-)
> > 
> > diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> > index 3b88e14d239f..28d9ff282017 100644
> > --- a/virt/kvm/arm/vgic/vgic.c
> > +++ b/virt/kvm/arm/vgic/vgic.c
> > @@ -339,6 +339,38 @@ int vgic_dist_enable_group(struct kvm *kvm, int group, bool status)
> >  	return 0;
> >  }
> >  
> > +/*
> > + * Check whether a given IRQs need to be queued to this ap_list, and do
> > + * so if that's the case.
> > + * Requires the ap_list_lock to be held (but not the irq lock).
> > + *
> > + * Returns 1 if that IRQ has been added to the ap_list, and 0 if not.
> > + */
> > +static int queue_enabled_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
> > +			     int intid)  
> 
> true/false seems better than 1/0.

Mmh, indeed. I think I had more in there in an earlier version.

> > +{
> > +	struct vgic_irq *irq = vgic_get_irq(kvm, vcpu, intid);
> > +	int ret = 0;
> > +
> > +	raw_spin_lock(&irq->irq_lock);
> > +	if (!irq->vcpu && vcpu == vgic_target_oracle(irq)) {
> > +		/*
> > +		 * Grab a reference to the irq to reflect the
> > +		 * fact that it is now in the ap_list.
> > +		 */
> > +		vgic_get_irq_kref(irq);
> > +		list_add_tail(&irq->ap_list,
> > +			      &vcpu->arch.vgic_cpu.ap_list_head);  
> 
> Two things:
> - This should be the job of vgic_queue_irq_unlock. Why are you
>   open-coding it?

I was *really* keen on reusing that, but couldn't  for two reasons:
a) the locking code inside vgic_queue_irq_unlock spoils that: It requires the irq_lock to be held, but not the ap_list_lock. Then it takes both locks, but returns with both of them dropped. We need to hold the ap_list_lock all of the time, to prevent any VCPU returning to the HV to interfere with this routine.
b) vgic_queue_irq_unlock() kicks the VCPU already, where I want to just add all of them first, then kick the VCPU at the end.

So I decided to go with the stripped-down version of it, because I didn't dare to touch the original function. I could refactor this "actually add to the list" part of vgic_queue_irq_unlock() into this new function, then call it from both vgic_queue_irq_unlock() and from the new users.

> - What if the interrupt isn't pending? Non-pending, non-active
>   interrupts should not be on the AP list!

That should be covered by vgic_target_oracle() already, shouldn't it?

> > +		irq->vcpu = vcpu;
> > +
> > +		ret = 1;
> > +	}
> > +	raw_spin_unlock(&irq->irq_lock);
> > +	vgic_put_irq(kvm, irq);
> > +
> > +	return ret;
> > +}
> > +
> >  /*
> >   * The group enable status of at least one of the groups has changed.
> >   * If enabled is true, at least one of the groups got enabled.
> > @@ -346,17 +378,57 @@ int vgic_dist_enable_group(struct kvm *kvm, int group, bool status)
> >   */
> >  void vgic_rescan_pending_irqs(struct kvm *kvm, bool enabled)
> >  {
> > +	int cpuid;
> > +	struct kvm_vcpu *vcpu;
> > +
> >  	/*
> > -	 * TODO: actually scan *all* IRQs of the VM for pending IRQs.
> > -	 * If a pending IRQ's group is now enabled, add it to its ap_list.
> > -	 * If a pending IRQ's group is now disabled, kick the VCPU to
> > -	 * let it remove this IRQ from its ap_list. We have to let the
> > -	 * VCPU do it itself, because we can't know the exact state of an
> > -	 * IRQ pending on a running VCPU.
> > +	 * If no group got enabled, we only have to potentially remove
> > +	 * interrupts from ap_lists. We can't do this here, because a running
> > +	 * VCPU might have ACKed an IRQ already, which wouldn't immediately
> > +	 * be reflected in the ap_list.
> > +	 * So kick all VCPUs, which will let them re-evaluate their ap_lists
> > +	 * by running vgic_prune_ap_list(), removing no longer enabled
> > +	 * IRQs.
> > +	 */
> > +	if (!enabled) {
> > +		vgic_kick_vcpus(kvm);
> > +
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * At least one group went from disabled to enabled. Now we need
> > +	 * to scan *all* IRQs of the VM for newly group-enabled IRQs.
> > +	 * If a pending IRQ's group is now enabled, add it to the ap_list.
> > +	 *
> > +	 * For each VCPU this needs to be atomic, as we need *all* newly
> > +	 * enabled IRQs in be in the ap_list to determine the highest
> > +	 * priority one.
> > +	 * So grab the ap_list_lock, then iterate over all private IRQs and
> > +	 * all SPIs. Once the ap_list is updated, kick that VCPU to
> > +	 * forward any new IRQs to the guest.
> >  	 */
> > +	kvm_for_each_vcpu(cpuid, vcpu, kvm) {
> > +		unsigned long flags;
> > +		int i;
> >  
> > -	 /* For now just kick all VCPUs, as the old code did. */
> > -	vgic_kick_vcpus(kvm);
> > +		raw_spin_lock_irqsave(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
> > +
> > +		for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++)
> > +			queue_enabled_irq(kvm, vcpu, i);
> > +
> > +		for (i = VGIC_NR_PRIVATE_IRQS;
> > +		     i < kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS; i++)
> > +			queue_enabled_irq(kvm, vcpu, i);  
> 
> On top of my questions above, what happens to LPIs?

Oh dear. Looks like wishful thinking on my side ;-) Iterating over all interrupts is probably not a good idea anymore.
Do you think this idea of having a list with group-disabled IRQs is a better approach: In vgic_queue_irq_unlock, if a pending IRQ's group is enabled, it goes into the ap_list, if not, it goes into another list instead. Then we would only need to consult this other list when a group gets enabled. Both lists protected by the same ap_list_lock. Does that make sense?

> And if a group has
> been disabled, how do you retire these interrupts from the AP list?

This is done above: we kick the respective VCPU and rely on vgic_prune_ap_list() to remove them (that uses vgic_target_oracle(), which in turn checks vgic_irq_is_grp_enabled()).

Cheers,
Andre.

> > +
> > +                raw_spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock,
> > +                                           flags);
> > +
> > +		if (kvm_vgic_vcpu_pending_irq(vcpu)) {
> > +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> > +			kvm_vcpu_kick(vcpu);
> > +		}
> > +	}
> >  }
> >  
> >  bool vgic_dist_group_enabled(struct kvm *kvm, int group)  
> 
> Thanks,
> 
> 	M.

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm