Re: [PATCH v2 4/4] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

Jan Kiszka <jan.kiszka@xxxxxxxxxxx> · Tue, 02 Nov 2010 20:11:31 +0100

Am 02.11.2010 19:52, Michael S. Tsirkin wrote:
> On Tue, Nov 02, 2010 at 07:40:46PM +0100, Jan Kiszka wrote:
>> Am 02.11.2010 19:24, Michael S. Tsirkin wrote:
>>> On Tue, Nov 02, 2010 at 06:56:14PM +0100, Jan Kiszka wrote:
>>>>>>  		dev->host_irq_disabled = false;
>>>>>>  	}
>>>>>> -	spin_unlock(&dev->intx_lock);
>>>>>> +out:
>>>>>> +	spin_unlock_irq(&dev->intx_lock);
>>>>>> +
>>>>>> +	if (reassert)
>>>>>> +		kvm_set_irq(dev->kvm, dev->irq_source_id, dev->guest_irq, 1);
>>>>>
>>>>> Hmm, I think this still has more overhead than it needs to have.
>>>>> Instead of setting level to 0 and then back to 1, can't we just
>>>>> avoid set to 1 in the first place? This would need a different
>>>>> interface than pci_2_3_irq_check_and_unmask to avoid a race
>>>>> where interrupt is received while we are acking another one:
>>>>>
>>>>> 	block userspace access
>>>>> 	check pending bit
>>>>> 	if (!pending)
>>>>> 		set irq (0)
>>>>> 	clear pending
>>>>> 	block userspace access
>>>>>
>>>>> Would be worth it for high volume devices.
>>>>
>>>> The problem is that we can't reorder guest IRQ line clearing and host
>>>> IRQ line enabling without taking a lock across host IRQ disable + guest
>>>> IRQ raise - and that is now distributed across hard and threaded IRQ
>>>> handlers and we don't want to hold and IRQ-safe lock during kvm_set_irq.
>>>
>>> Oh I think I confused you.
>>> What I mean is:
>>>
>>>  	block userspace access
>>>  	check interrupt status bit
>>>  	if (!interrupt status bit set)
>>>  		set irq (0)
>>>  	clear interrupt disable bit
>>>  	block userspace access
>>>
>>> This way we enable interrupt after set irq so not need for
>>> extra locks I think.
>>
>> OK. Would require some serious refactoring again.
> 
> That would also mean we can't just solve the nested block/unblock
> problem with a simple lock.  Not sure this is worth the effort.

Can't follow: what can be nested and how?

> 
>> But what about edge IRQs? Don't we need to toggle the bit for them? And
>> as we do not differentiate between level and edge, we currently have to
>> do this unconditionally.
> 
> AFAIK PCI IRQs are level, so I don't think we need to bother.

Ah, indeed.

> 
>>>
>>> Hmm one thing I noticed is that pci_block_user_cfg_access
>>> will BUG_ON if it was already blocked. So I think we have
>>> a bug here when interrupt handler kicks in right after
>>> we unmask interrupts.
>>>
>>> Probably need some kind of lock to protect against this.
>>>
>>
>> Or an atomic counter.
> 
> BTW block userspace access uses a global spinlock which will likely hurt
> us on multi-CPU. Switching that to something more SMP friendly, e.g. a
> per-device spinlock, might be a good idea: I don't see why that lock and
> queue are global.

Been through that code recently, hairy stuff. pci_lock also protects the
bus operation which can be overloaded (e.g. for error injection - if
that is not the only use case). So we need a per-bus lock, but that can
still cause contentions if devices on the same bus are handled on
different cpus.

I think the whole PCI config interface is simply not designed for
performance. It's considered a slow path, which it normally is.

> 
>> Will have a look.
> 
> Need to also consider an interrupt running in parallel
> with unmasking in thread.
> 
>> Alex, does VFIO take care of this already?
> 
> VFIO does this all under spin_lock_irq.

Everything has its pros and cons...

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html