Re: [PATCH 2/3] kvm: arm: VGIC: Scan all IRQs when interrupt group gets enabled

Marc Zyngier <maz@xxxxxxxxxx> · Tue, 19 Nov 2019 09:40:40 +0000

Hi Andre,

On 2019-11-18 14:12, Andre Przywara wrote:
On Thu, 14 Nov 2019 11:16:55 +0000
Marc Zyngier <maz@xxxxxxxxxx> wrote:

Hi Marc,

On 2019-11-12 09:36, Andre Przywara wrote:
> On Sun, 10 Nov 2019 14:29:14 +0000
> Marc Zyngier <maz@xxxxxxxxxx> wrote:
>
> Hi Marc,
>
>> On Fri,  8 Nov 2019 17:49:51 +0000
>> Andre Przywara <andre.przywara@xxxxxxx> wrote:
>>
>> > Our current VGIC emulation code treats the "EnableGrpX" bits in
>> GICD_CTLR
>> > as a single global interrupt delivery switch, where in fact the
>> GIC
>> > architecture asks for this being separate for the two interrupt
>> groups.
>> >
>> > To implement this properly, we have to slightly adjust our 
design,
>> to
>> > *not* let IRQs from a disabled interrupt group be added to the
>> ap_list.
>> >
>> > As a consequence, enabling one group requires us to re-evaluate
>> every
>> > pending IRQ and potentially add it to its respective ap_list.
>> Similarly
>> > disabling an interrupt group requires pending IRQs to be 
removed
>> from
>> > the ap_list (as long as they have not been activated yet).
>> >
>> > Implement a rather simple, yet not terribly efficient algorithm 
to
>> > achieve this: For each VCPU we iterate over all IRQs, checking 
for
>> > pending ones and adding them to the list. We hold the 
ap_list_lock
>> > for this, to make this atomic from a VCPU's point of view.
>> >
>> > When an interrupt group gets disabled, we can't directly remove
>> affected
>> > IRQs from the ap_list, as a running VCPU might have already
>> activated
>> > them, which wouldn't be immediately visible to the host.
>> > Instead simply kick all VCPUs, so that they clean their 
ap_list's
>> > automatically when running vgic_prune_ap_list().
>> >
>> > Signed-off-by: Andre Przywara <andre.przywara@xxxxxxx>
>> > ---
>> >  virt/kvm/arm/vgic/vgic.c | 88
>> ++++++++++++++++++++++++++++++++++++----
>> >  1 file changed, 80 insertions(+), 8 deletions(-)
>> >
>> > diff --git a/virt/kvm/arm/vgic/vgic.c 
b/virt/kvm/arm/vgic/vgic.c
>> > index 3b88e14d239f..28d9ff282017 100644
>> > --- a/virt/kvm/arm/vgic/vgic.c
>> > +++ b/virt/kvm/arm/vgic/vgic.c
>> > @@ -339,6 +339,38 @@ int vgic_dist_enable_group(struct kvm 
*kvm,
>> int group, bool status)
>> >  	return 0;
>> >  }
>> >
>> > +/*
>> > + * Check whether a given IRQs need to be queued to this 
ap_list,
>> and do
>> > + * so if that's the case.
>> > + * Requires the ap_list_lock to be held (but not the irq 
lock).
>> > + *
>> > + * Returns 1 if that IRQ has been added to the ap_list, and 0 
if
>> not.
>> > + */
>> > +static int queue_enabled_irq(struct kvm *kvm, struct kvm_vcpu
>> *vcpu,
>> > +			     int intid)
>>
>> true/false seems better than 1/0.
>
> Mmh, indeed. I think I had more in there in an earlier version.
>
>> > +{
>> > +	struct vgic_irq *irq = vgic_get_irq(kvm, vcpu, intid);
>> > +	int ret = 0;
>> > +
>> > +	raw_spin_lock(&irq->irq_lock);
>> > +	if (!irq->vcpu && vcpu == vgic_target_oracle(irq)) {
>> > +		/*
>> > +		 * Grab a reference to the irq to reflect the
>> > +		 * fact that it is now in the ap_list.
>> > +		 */
>> > +		vgic_get_irq_kref(irq);
>> > +		list_add_tail(&irq->ap_list,
>> > +			      &vcpu->arch.vgic_cpu.ap_list_head);
>>
>> Two things:
>> - This should be the job of vgic_queue_irq_unlock. Why are you
>>   open-coding it?
>
> I was *really* keen on reusing that, but couldn't  for two 
reasons:
> a) the locking code inside vgic_queue_irq_unlock spoils that: It
> requires the irq_lock to be held, but not the ap_list_lock. Then 
it
> takes both locks, but returns with both of them dropped. We need 
to
> hold the ap_list_lock all of the time, to prevent any VCPU 
returning
> to the HV to interfere with this routine.
> b) vgic_queue_irq_unlock() kicks the VCPU already, where I want to
> just add all of them first, then kick the VCPU at the end.

Indeed, and that is why you need to change the way you queue these
pending, enabled, group-disabled interrupts (see the LPI issue 
below).

>
> So I decided to go with the stripped-down version of it, because I
> didn't dare to touch the original function. I could refactor this
> "actually add to the list" part of vgic_queue_irq_unlock() into 
this
> new function, then call it from both vgic_queue_irq_unlock() and 
from
> the new users.
>
>> - What if the interrupt isn't pending? Non-pending, non-active
>>   interrupts should not be on the AP list!
>
> That should be covered by vgic_target_oracle() already, shouldn't 
it?

Ah, yes, you're right.

>
>> > +		irq->vcpu = vcpu;
>> > +
>> > +		ret = 1;
>> > +	}
>> > +	raw_spin_unlock(&irq->irq_lock);
>> > +	vgic_put_irq(kvm, irq);
>> > +
>> > +	return ret;
>> > +}
>> > +
>> >  /*
>> >   * The group enable status of at least one of the groups has
>> changed.
>> >   * If enabled is true, at least one of the groups got enabled.
>> > @@ -346,17 +378,57 @@ int vgic_dist_enable_group(struct kvm 
*kvm,
>> int group, bool status)
>> >   */
>> >  void vgic_rescan_pending_irqs(struct kvm *kvm, bool enabled)
>> >  {
>> > +	int cpuid;
>> > +	struct kvm_vcpu *vcpu;
>> > +
>> >  	/*
>> > -	 * TODO: actually scan *all* IRQs of the VM for pending IRQs.
>> > -	 * If a pending IRQ's group is now enabled, add it to its
>> ap_list.
>> > -	 * If a pending IRQ's group is now disabled, kick the VCPU to
>> > -	 * let it remove this IRQ from its ap_list. We have to let 
the
>> > -	 * VCPU do it itself, because we can't know the exact state 
of
>> an
>> > -	 * IRQ pending on a running VCPU.
>> > +	 * If no group got enabled, we only have to potentially 
remove
>> > +	 * interrupts from ap_lists. We can't do this here, because a
>> running
>> > +	 * VCPU might have ACKed an IRQ already, which wouldn't
>> immediately
>> > +	 * be reflected in the ap_list.
>> > +	 * So kick all VCPUs, which will let them re-evaluate their
>> ap_lists
>> > +	 * by running vgic_prune_ap_list(), removing no longer 
enabled
>> > +	 * IRQs.
>> > +	 */
>> > +	if (!enabled) {
>> > +		vgic_kick_vcpus(kvm);
>> > +
>> > +		return;
>> > +	}
>> > +
>> > +	/*
>> > +	 * At least one group went from disabled to enabled. Now we 
need
>> > +	 * to scan *all* IRQs of the VM for newly group-enabled IRQs.
>> > +	 * If a pending IRQ's group is now enabled, add it to the
>> ap_list.
>> > +	 *
>> > +	 * For each VCPU this needs to be atomic, as we need *all* 
newly
>> > +	 * enabled IRQs in be in the ap_list to determine the highest
>> > +	 * priority one.
>> > +	 * So grab the ap_list_lock, then iterate over all private 
IRQs
>> and
>> > +	 * all SPIs. Once the ap_list is updated, kick that VCPU to
>> > +	 * forward any new IRQs to the guest.
>> >  	 */
>> > +	kvm_for_each_vcpu(cpuid, vcpu, kvm) {
>> > +		unsigned long flags;
>> > +		int i;
>> >
>> > -	 /* For now just kick all VCPUs, as the old code did. */
>> > -	vgic_kick_vcpus(kvm);
>> > +		raw_spin_lock_irqsave(&vcpu->arch.vgic_cpu.ap_list_lock,
>> flags);
>> > +
>> > +		for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++)
>> > +			queue_enabled_irq(kvm, vcpu, i);
>> > +
>> > +		for (i = VGIC_NR_PRIVATE_IRQS;
>> > +		     i < kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS; i++)
>> > +			queue_enabled_irq(kvm, vcpu, i);
>>
>> On top of my questions above, what happens to LPIs?
>
> Oh dear. Looks like wishful thinking on my side ;-) Iterating over
> all interrupts is probably not a good idea anymore.
> Do you think this idea of having a list with group-disabled IRQs 
is a
> better approach: In vgic_queue_irq_unlock, if a pending IRQ's 
group
> is
> enabled, it goes into the ap_list, if not, it goes into another 
list
> instead. Then we would only need to consult this other list when a
> group gets enabled. Both lists protected by the same ap_list_lock.
> Does that make sense?

I think that could work. One queue for each group, holding pending,
enabled, group-disabled interrupts. Pending, disabled interrupts are
not queued anywhere, just like today.

The only snag is per-cpu interrupts. On which queue do they live?
Do you have per-CPU queues? or a global one?

Yes, the idea was to have a per-VCPU "grp_dis_list" in addition to
the ap_list, reusing the ap_list list_head in struct vgic_irq.
vgic_queue_irq_unlock() would put them into *one* of those two lists,
depending on their group-enabled status. When a group gets enabled, 
we
just have to transfer the IRQs from grp_dis_list to ap_list.

But fleshing this out I was wondering if it couldn't be much simpler:
We ignore the group-enabled status most of the time, except in
vgic_flush_lr_state(). So group-disabled IRQs *would go* to the
ap_list (when they are otherwise pending|active and enabled), but
would be skipped when eventually populating the LRs.
vgic_prune_ap_list would also not touch them, so they would stay in
the ap_list (unless removed for other reasons).

That might raise some eyebrows (because we keep IRQs in the ap_list
which are not ready), but would require only minimal changes and 
avoid
all kind of nasty/racy code to be added. The only downside I see is
that the ap_list could potentially be much longer, but we could 
change
the sorting algorithm if needed to keep group-disabled IRQs at the
end, at which point it wouldn't really matter.

Do you see any problem with that approach? Alex seemed to remember
that you had an objection against a very similar (if not identical)
idea before.

My main worry with this is that it causes overhead on the fast path.
Disabled interrupts (for whichever reason they are disabled) shouldn't
have to be evaluated on the fast path.

Take for example kvm_vgic_vcpu_pending_irq(), which we evaluate pretty
often (each time a vcpu wakes up). Do we want to scan a bunch of
group-disabled interrupts there? No.

At the end of the day, what we're looking at is a list of disabled,
pending interrupts. They can be disabled for multiple reasons
(group is disabled, or interrupt itself is disabled). But they should
*not* end-up on the AP list, because that list has a precise semantic.

Your suggestion to add the group-disabled interrupts to the AP list
may be a cool hack, but it is mostly a hack that opens the whole thing
to a bunch of corner cases. Let's not do that.

>> And if a group has
>> been disabled, how do you retire these interrupts from the AP 
list?
>
> This is done above: we kick the respective VCPU and rely on
> vgic_prune_ap_list() to remove them (that uses 
vgic_target_oracle(),
> which in turn checks vgic_irq_is_grp_enabled()).

But what if the CPU isn't running? Kicking it isn't going to do 
much,
is it?

Not directly, but in either approach that would be handled similar to
disabled interrupts: once the VCPU runs, they would *not* end up in
LRs (because we check the oracle before), and would be cleaned up in
prune() once the guest exits (at least for the original approach).

I lost track of the original approach already :-/

Try and build the above suggestion. It should follow the same flow as
the enabled, group-enabled interrupts, just with a different list.

Thanks,

        M.
--
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm