Re: [PATCH v4 9/9] KVM: Allow host IRQ sharing for passed-through PCI 2.3 devices

Jan Kiszka <jan.kiszka@xxxxxxxxxxx> · Tue, 09 Nov 2010 15:11:05 +0100

Am 09.11.2010 14:41, Avi Kivity wrote:
> On 11/09/2010 03:35 PM, Jan Kiszka wrote:
>> Am 09.11.2010 14:27, Avi Kivity wrote:
>>>  On 11/08/2010 01:21 PM, Jan Kiszka wrote:
>>>>  PCI 2.3 allows to generically disable IRQ sources at device level. This
>>>>  enables us to share IRQs of such devices between on the host side when
>>>>  passing them to a guest. This feature is optional, user space has to
>>>>  request it explicitly. Moreover, user space can inform us about its view
>>>>  of PCI_COMMAND_INTX_DISABLE so that we can avoid unmasking the interrupt
>>>>  and signaling it if the guest masked it via the PCI config space.
>>>>
>>>
>>>  It's a pity this cannot be done transparently.  We could detect multiple
>>>  devices sharing the line,
>>
>> Even that is not possible. Assigned or host devices may be activated
>> after we registered exclusively, pushing the breakage from VM start-up
>> to a different operation.
> 
> We could detect that and switch the interrupt mode.  Or we could always 
> to IRQF_SHARED and fake something in the immediate callback.
> 
>>>  but what about PCI_COMMAND_INTX_DISABLE?
>>>
>>>  Perhaps we can hook the kernel's handler for this bit?
>>
>> Some IRQ registration notifier that would allow us to reregister our
>> handler with IRQ sharing support? Maybe.
> 
> Adding an internal API if preferable to an external one (it may be a 
> pain to kvm-kmod users though).

Primary concern should be a clean and robust API. From that POV, I would
prefer an official hook with genirq maintainer blessing over fragile
detection heuristics in kvm. Given that VSIO should benefit from any
transparent pattern we develop here as well, it's probably worth to go
that path - if it is really preferred over manual control like this
patch proposes.

For kvm-kmod, we could simply enforce IRQ sharing measures
unconditionally. Not optimal from the performance POV, but people
concerned that much about performance should better use KVM over the
corresponding kernel anyway.

> 
>>>
>>>>  +
>>>>  +Capability: KVM_CAP_PCI_2_3
>>>>  +Architectures: x86
>>>>  +Type: vm ioctl
>>>>  +Parameters: struct kvm_assigned_pci_dev (in)
>>>>  +Returns: 0 on success, -1 on error
>>>>  +
>>>>  +Informs the kernel about the guest's view on the INTx mask. As long as the
>>>>  +guest masks the legacy INTx, the kernel will refrain from unmasking it at
>>>>  +hardware level and will not assert the guest's IRQ line. User space is still
>>>>  +responsible for applying this state to the assigned device's real config space.
>>>
>>>  What if userspace lies?
>>
>> User space problem. We will at worst receive one IRQ, mask it, and then
>> user space need to react again.
> 
> Ok.
> 
>>>
>>>  I saw no reason this can't be a spinlock, but perhaps I missed
>>>  something.  This would allow us to avoid srcu, which is slightly more
>>>  expensive than rcu.  Since pci 2.3 assigned devices are not a major use
>>>  case, I'd like not to penalize the mainstream users for this.
>>
>> The lock has to be held across kvm_set_irq, which is the potentially
>> expensive (O(n), n == number of VCPUs) operation.
> 
> What we should probably do is have broadcast interrupts deferred to a 
> thread.  I agree it isn't pretty.

Not sure we could defer it that easily (if at all). However, I think
such improvements should be done on top if this already complex change.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html