Re: [PATCH 0/7] Consolidate vcpu ioctl locking

Alexander Graf <agraf@xxxxxxx> · Sat, 15 May 2010 10:26:20 +0200

On 15.05.2010, at 09:59, Avi Kivity wrote:

> On 05/15/2010 09:21 AM, Alexander Graf wrote:
>> 
>>> On x86 eflags.if is freely changeable by the guest, so if we want to queue an interrupt we have to IPI the vcpu to force it out of guest mode, so we can inspect eflags.  This means the vcpu thread has to be interrupted one way or another.
>>> 
>>> The tpr (really ppr) is even more problematic as it is maintained in userspace, not in the kernel (for non-kernel-irqchip).  It could in theory be inspected by another thread, but we wouldn't gain anything by it due to the requirement to IPI.
>>>     
>> Hrm right. On PPC we trap on every MSR change, so we get notified when interrupts are enabled again. But isn't that what VINTR intercepts are there for? Just add the  lowest active TPR value to the KVM_INTERRUPT ioctl and then wait until the guest is ready to take it.
>>   
> 
> Yes, and we do that when using in-kernel-irqchip.  With userspace apic, the tpr is maintained in userspace and the kernel has no visibility into it.
> 
> We could have changed moved the tpr/ppr into the kernel for userspace irqchip, but there's not much point now.
> 
>>>> void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec)
>>>> {
>>>>    vcpu->stat.queue_intr++;
>>>> 
>>>>    set_bit(kvmppc_book3s_vec2irqprio(vec),
>>>> &vcpu->arch.pending_exceptions);
>>>> #ifdef EXIT_DEBUG
>>>>    printk(KERN_INFO "Queueing interrupt %x\n", vec);
>>>> #endif
>>>> }
>>>>       
>>> Isn't this missing an IPI if the vcpu is in guest mode?
>>>     
>> Yes, it is :). At least with qemu we're 100% sure we're not in VCPU_RUN when an interrupt gets injected, as the injection happens in kvm_arch_pre_run.
>>   
> 
> That means you never inject an interrupt from the iothread (or from a different vcpu thread)?
> 
> If that's the case we might make it part of the API and require the ioctl to be issued from the vcpu thread.  We'd still be left with the s390 exception.

Well I'm not sure that's guaranteed for MOL or Dolphin, but I guess the user base is small enough to ignore them.

Either way, I'm actually rather unhappy with the way interrupts work right now. We're only injecting interrupts when in the main loop, which is rare if we did our homework right. So what I'd really like to see is that the MPIC on ppc directly invokes KVM_INTERRUPT to pull (and losen) the interrupt line. That way we can't just accidently miss interrupts.

What happens now with ppc64 guests that have EE lazily disabled is:

* device in userspace wants to trigger an interrupt
* mpic pulls external interrupt line
* kvm_arch_pre_run sees that external interrupt line is pulled, issues ioctl(KVM_INTERRUPT), kernel space sets trigger_interrupt=1
* we enter the guest
* we inject an external interrupt, set trigger_interrupt=0
* guest gets the interrupt, sees it's lazily disabled, sets msr.ee=0
* guest moves on, sets msr.ee=1 again later
*** guest expects the interrupt to trigger again, but we don't know that it's still pending
-> we need to exit to userspace to realize that the interrupt is still active

This is fundamentally broken. What I'd like to see is:

* device in userspace wants to trigger an interrupt
* mpic pulls external interrupt line, automatically issues ioctl(KVM_INTERRUPT)
* we enter the guest
* we inject the external interrupt
* guest gets the interrupt, sees it's lazily disabled, sets msr.ee=0
* guest moves on, sets msr.ee=1 again later
* we inject the external interrupt again, since it's still active
* guest acknowledges the interrupt with the mpic, issues ioctl(KVM_INTERRUPT, disable)
-> all is great

For that to work we need to enable triggering KVM_INTERRUPT from a non-vcpu thread.

On S390, I'm also still sceptical if the implementation we have really works. A device injects an S390_INTERRUPT with its address and on the next vcpu_run, an according interrupt is issued. But what happens if two devices trigger an S390_INTERRUPT before the vcpu_run? We'd have lost an interrupt by then...

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ia64" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html