[Android-virt] [PATCH 06/15] ARM: KVM: VGIC interrupt injection

c.dall at virtualopensystems.com (Christoffer Dall) · Mon, 25 Jun 2012 15:18:51 -0400

On Mon, Jun 25, 2012 at 2:58 PM, Marc Zyngier <marc.zyngier at arm.com> wrote:
> On 22/06/12 20:20, Christoffer Dall wrote:
>> On Mon, May 14, 2012 at 9:05 AM, Marc Zyngier <marc.zyngier at arm.com> wrote:
>>> Plug the interrupt injection code. Interrupts can now be generated
>>> from user space.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier at arm.com>
>>> ---
>>> ?arch/arm/include/asm/kvm_vgic.h | ? ?1 +
>>> ?arch/arm/kvm/arm.c ? ? ? ? ? ? ?| ? 32 +++++++++++++++++++++++++-
>>> ?arch/arm/kvm/vgic.c ? ? ? ? ? ? | ? 47 +++++++++++++++++++++++++++++++++++++++
>>> ?3 files changed, 79 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h
>>> index 939f3f8..e293f60 100644
>>> --- a/arch/arm/include/asm/kvm_vgic.h
>>> +++ b/arch/arm/include/asm/kvm_vgic.h
>>> @@ -140,6 +140,7 @@ struct kvm_run;
>>> ?#ifdef CONFIG_KVM_ARM_VGIC
>>> ?void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu);
>>> ?void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu);
>>> +int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, const struct kvm_irq_level *irq);
>>> ?int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>> ?int vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run);
>>>
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index a573f16..c03e450 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -582,7 +582,28 @@ static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm,
>>> ?long kvm_arch_vcpu_ioctl(struct file *filp,
>>> ? ? ? ? ? ? ? ? ? ? ? ? unsigned int ioctl, unsigned long arg)
>>> ?{
>>> - ? ? ? return -EINVAL;
>>> + ? ? ? struct kvm_vcpu *vcpu = filp->private_data;
>>> + ? ? ? void __user *argp = (void __user *)arg;
>>> +
>>> + ? ? ? switch (ioctl) {
>>> +#ifdef CONFIG_KVM_ARM_VGIC
>>> + ? ? ? case KVM_IRQ_LINE: {
>>> + ? ? ? ? ? ? ? struct kvm_irq_level irq_event;
>>> +
>>> + ? ? ? ? ? ? ? if (copy_from_user(&irq_event, argp, sizeof irq_event))
>>> + ? ? ? ? ? ? ? ? ? ? ? return -EFAULT;
>>> +
>>> + ? ? ? ? ? ? ? if (!irqchip_in_kernel(vcpu->kvm))
>>> + ? ? ? ? ? ? ? ? ? ? ? return -EINVAL;
>>> +
>>> + ? ? ? ? ? ? ? if (irq_event.irq < 16 || irq_event.irq >= 32)
>>> + ? ? ? ? ? ? ? ? ? ? ? return -EINVAL;
>>> + ? ? ? ? ? ? ? return kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id, &irq_event);
>>> + ? ? ? }
>>
>> this whole vcpu/kvm ioctl feels a bit broken. I am worried what
>> upstream people will say. Can we not propose a change to the irq_event
>> struct, encode it in the irq field, or come up with a new VM ioctl or
>> something?
>
> Coming up with a whole new interface is something I've tried very hard
> to avoid. I agree about the hackish taste of it, but I'd rather not
> reinvent the wheel here, unless we can't properly fit the information.
>
> Now, we could encode the CPU number into the irq field (top 16 bits?),
> and get rid of the vcpu ioctl. Not to mention that we don't inject any
> PPI from QEMU when emulating an A15, so this code path is completely
> untested.
>

Let's use those top 16 bits to encode the CPU for both the in-kernel
GIC and user space GIC cases then. Seems much cleaner to me.

>>> +#endif
>>> + ? ? ? default:
>>> + ? ? ? ? ? ? ? return -EINVAL;
>>> + ? ? ? }
>>> ?}
>>>
>>> ?int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>>> @@ -602,6 +623,15 @@ long kvm_arch_vm_ioctl(struct file *filp,
>>>
>>> ? ? ? ? ? ? ? ?if (copy_from_user(&irq_event, argp, sizeof irq_event))
>>> ? ? ? ? ? ? ? ? ? ? ? ?return -EFAULT;
>>> +
>>> +#ifdef CONFIG_KVM_ARM_VGIC
>>> + ? ? ? ? ? ? ? if (irqchip_in_kernel(kvm)) {
>>> + ? ? ? ? ? ? ? ? ? ? ? if (irq_event.irq < 32)
>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? return -EINVAL;
>>> + ? ? ? ? ? ? ? ? ? ? ? return kvm_vgic_inject_irq(kvm, 0, &irq_event);
>>> + ? ? ? ? ? ? ? }
>>> +#endif
>>> +
>>> ? ? ? ? ? ? ? ?return kvm_arch_vm_ioctl_irq_line(kvm, &irq_event);
>>> ? ? ? ?}
>>> ? ? ? ?default:
>>> diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
>>> index ccd8b69..a48b436 100644
>>> --- a/arch/arm/kvm/vgic.c
>>> +++ b/arch/arm/kvm/vgic.c
>>> @@ -35,6 +35,7 @@
>>> ?#define ACCESS_WRITE_MASK(x) ? ((x) & (3 << 1))
>>>
>>> ?static void vgic_update_state(struct kvm *kvm);
>>> +static void kvm_vgic_kick_vcpus(struct kvm *kvm);
>>> ?static void vgic_dispatch_sgi(struct kvm_vcpu *vcpu, u32 reg);
>>>
>>> ?static void mmio_do_copy(struct kvm_run *run, u32 *reg, u32 offset, int mode)
>>> @@ -371,6 +372,8 @@ int vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>> ? ? ? ?kvm_handle_mmio_return(vcpu, run);
>>> ? ? ? ?spin_unlock(&vcpu->kvm->arch.vgic.lock);
>>>
>>> + ? ? ? kvm_vgic_kick_vcpus(vcpu->kvm);
>>> +
>>> ? ? ? ?return KVM_EXIT_UNKNOWN;
>>> ?}
>>>
>>> @@ -671,3 +674,47 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
>>>
>>> ? ? ? ?return !!(atomic_read(&dist->irq_pending_on_cpu) & (1 << vcpu->vcpu_id));
>>> ?}
>>> +
>>> +static void kvm_vgic_kick_vcpus(struct kvm *kvm)
>>> +{
>>> + ? ? ? int nrcpus = atomic_read(&kvm->online_vcpus);
>>> + ? ? ? int c;
>>> +
>>> + ? ? ? /*
>>> + ? ? ? ?* We've injected an interrupt, time to find out who deserves
>>> + ? ? ? ?* a good kick...
>>> + ? ? ? ?*/
>>> + ? ? ? for (c = 0; c < nrcpus; c++) {
>>> + ? ? ? ? ? ? ? struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, c);
>>> +
>>> + ? ? ? ? ? ? ? if (kvm_vgic_vcpu_pending_irq(vcpu)) {
>>> + ? ? ? ? ? ? ? ? ? ? ? vcpu->arch.wait_for_interrupts = 0;
>>> + ? ? ? ? ? ? ? ? ? ? ? kvm_vcpu_kick(vcpu);
>>
>> is it worth optimizing this to not do it if the vcpu is actually
>> running and there are no empty LRs? probably not...
>
> Unfortunately, it is quite hard to know which state the vcpu is in
> unless you stop it. You could start looking at the registers, but the
> vcpu might be removed from under your feet, and you end up making the
> wrong decision.
>
> I actually tried to solve the opposite problem (injecting interrupts
> without stopping the guest), and found out it is simply not possible
> without some heavyweight synchronization between host and hypervisor.
> Which is a shame, as the current code is pretty heavy on the IPI side...
>
> Maybe as a (much) later optimization.
>

fair enough.

>>> + ? ? ? ? ? ? ? }
>>> + ? ? ? }
>>> +}
>>> +
>>> +int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, const struct kvm_irq_level *irq)
>>> +{
>>> + ? ? ? int nrcpus = atomic_read(&kvm->online_vcpus);
>>> +
>>> + ? ? ? if (WARN_ON(cpuid >= nrcpus))
>>> + ? ? ? ? ? ? ? return -EINVAL;
>>> +
>>> + ? ? ? /* Only PPIs or SPIs */
>>> + ? ? ? if (WARN_ON(irq->irq >= VGIC_NR_IRQS || irq->irq < 16))
>>> + ? ? ? ? ? ? ? return -EINVAL;
>>> +
>>
>> we should probably get rid of these WARN_ON calls. You could add some
>> kvm_debug statements if you want, but really EINVAL should tell you
>> what you need.
>
> Sure.
>
>>> + ? ? ? if (!irq->level)
>>> + ? ? ? ? ? ? ? return 0;
>>
>> so, more general question, is the interrupt injected model for
>> IRQ_LINE that we are injecting edge-triggered interrupts? Otherwise,
>> this feels a bit weird as the GIC specs seem to say that the pending
>> signal remains asserted and ack reads go to pending and active until
>> the processor actually talks to the device, which would involve
>> telling user space emulation in this case to deassert the line.
>
> Yeah, this is one of the major brain damages in this patch series.
> Basically, it works by fluke, and I discovered today while reworking the
> who irq_target madness. We really need to de-assert the interrupt on a
> level interrupt. I fixed in in my tree, but will need to take the
> irq_cfg field into account.
>

OK :)

>>> +
>>> + ? ? ? pr_debug("Inject IRQ%d\n", irq->irq);
>>> + ? ? ? spin_lock(&kvm->arch.vgic.lock);
>>> + ? ? ? vgic_bitmap_set_irq_val(&kvm->arch.vgic.irq_pending, cpuid, irq->irq, 1);
>>> + ? ? ? vgic_update_state(kvm);
>>> + ? ? ? spin_unlock(&kvm->arch.vgic.lock);
>>
>> I'm not convinced about the necessity for this lock. Do we do any
>> read-modify-updates here that are not just serializable atomic bit
>> operations already? If so, where would it break (looking at
>> vgic_update_state, compute_pending_for_cpu)?
>
> The MMIO accesses are using read-modify-update. You certainly could
> convert these to use atomic operations, but you'd end up with the same
> kind of code you use for spinlocks (exclusive access).
>
> Frankly, before going down that road, I'd like to see if we actually
> have any form of actual contention on that lock.
>

That's ok with me. Although, it seems the KVM guys (Hi Avi!) are quite
unhappy with spinlocks on every world-switch even though there may be
very little contention on these, which is why I'm pushing these
thoughts early on.

However, if there's no way to avoid it, then there's no way to avoid it...

-Christoffer