Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors

Jan Kiszka <jan.kiszka@xxxxxxxxxxx> · Wed, 19 Oct 2011 13:17:03 +0200

On 2011-10-19 11:03, Michael S. Tsirkin wrote:
>>> I thought we need to match APIC ID. That needs a table lookup, no?
>>
>> Yes. But that's completely independent of a concrete MSI message. In
>> fact, this is the same thing we need when interpreting an IOAPIC
>> redirection table entry. So let's create an APIC ID lookup table for the
>> destination ID field, maybe multiple of them to match different modes,
>> but not a MSI message table.
>>>
>>>> Or are you thinking about some cluster mode?
>>>
>>> That too.
> 
> Hmm, might be a good idea. APIC IDs are 8 bit, right?

Yep (more generally: destination IDs). So even if we have to create
multiple lookup tables for the various modes, that won't consume
megabytes of RAM.

> 
> 
>>>>>
>>>>>
>>>>>>>
>>>>>>> An analogy would be if read/write operated on file paths.
>>>>>>> fd makes it easy to do permission checks and slow lookups
>>>>>>> in one place. GSI happens to work like this (maybe, by accident).
>>>>>>
>>>>>> Think of an opaque file handle as a MSIRoutingCache object. And it
>>>>>> encodes not only the routing handle but also other useful associated
>>>>>> information we need from time to time - internally, not in the device
>>>>>> models.
>>>>>
>>>>> Forget qemu abstractions, I am talking about data path
>>>>> optimizations in kernel in kvm. From that POV the point of an fd is not
>>>>> that it is opaque. It is that it's an index in an array that
>>>>> can be used for fast lookups.
>>>>>
>>>>>>>>>
>>>>>>>>> Another concern is mask bit emulation. We currently
>>>>>>>>> handle mask bit in userspace but patches
>>>>>>>>> to do them in kernel for assigned devices where seen
>>>>>>>>> and IMO we might want to do that for virtio as well.
>>>>>>>>>
>>>>>>>>> For that to work the mask bit needs to be tied to
>>>>>>>>> a specific gsi or specific device, which does not
>>>>>>>>> work if we just inject arbitrary writes.
>>>>>>>>
>>>>>>>> Yes, but I do not see those valuable plans being negatively affected.
>>>>>>>>
>>>>>>>> Jan
>>>>>>>>
>>>>>>>
>>>>>>> I do.
>>>>>>> How would we maintain a mask/pending bit in kernel if we are not
>>>>>>> supplied info on all available vectors even?
>>>>>>
>>>>>> It's tricky to discuss an undefined interface (there only exists an
>>>>>> outdated proposal for kvm device assignment). But I suppose that user
>>>>>> space will have to define the maximum number of vectors when creating an
>>>>>> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.
>>>>>>
>>>>>> The number of used vectors will correlate with the number of registered
>>>>>> irqfds (in the case of vhost or vfio, device assignment still has
>>>>>> SET_MSIX_NR). As kernel space would then be responsible for mask
>>>>>> processing, user space would keep vectors registered with irqfds, even
>>>>>> if they are masked. It could just continue to play the trick and drop
>>>>>> data=0 vectors.
>>>>>
>>>>> Which trick?  We don't play any tricks except for device assignment.
>>>>>
>>>>>> The point here is: All those steps have _nothing_ to do with the generic
>>>>>> MSI-X core. They are KVM-specific "side channels" for which KVM provides
>>>>>> an API. In contrast, msix_vector_use/unuse were generic services that
>>>>>> were actually created to please KVM requirements. But if we split that
>>>>>> up, we can address the generic MSI-X requirements in a way that makes
>>>>>> more sense for emulated devices (and particularly msix_vector_use makes
>>>>>> no sense for emulation).
>>>>>>
>>>>>> Jan
>>>>>>
>>>>>
>>>>> We need at least msix_vector_unuse
>>>>
>>>> Not at all. We rather need some qemu_irq_set(level) for MSI.
>>>> The spec
>>>> requires that the device clears pending when the reason for that is
>>>> removed. And any removal that is device model-originated should simply
>>>> be signaled like an irq de-assert.
>>>
>>> OK, this is a good argument.
>>> In particular virtio ISR read could clear msix pending bit
>>> (note: it would also need to clear irqfd as that is where
>>>  we get the pending bit).
>>>
>>> I would prefer not to use qemu_irq_set for this though.
>>> We can add a level flag to msix_notify.
>>
>> No concerns.
>>
>>>
>>>> Vector "unusage" is just one reason here.
>>>
>>> I don't see removing the use/unuse functions as a priority though,
>>> but if we add an API that also lets devices say
>>> 'reason for interrupt is removed', that would be nice.
>>>
>>> Removing extra code can then be done separately, and on qemu.git
>>> not on qemu-kvm.
>>
>> If we refrain from hacking KVM logic into the use/unuse services
>> upstream, we can do this later on. For me it is important that those
>> obsolete services do not block or complicate further cleanups of the MSI
>> layer nor bother device model creators with tasks they should not worry
>> about.
> 
> My assumption is devices shall keep calling use/unuse until we drop it.
> Does not seem like a major bother. If you like, use all vectors
> or just those with message != 0.

What about letting only those devices call use/unuse that sometimes need
less than the maximum amount? All other would benefit for an use_all
executed on enable and a unuse_all on disable/reset/uninit.

> 
>>>
>>>>> - IMO it makes more sense than "clear
>>>>> pending vector". msix_vector_use is good to keep around for symmetry:
>>>>> who knows whether we'll need to allocate resources per vector
>>>>> in the future.
>>>>
>>>> For MSI[-X], the spec is already there, and we know that there no need
>>>> for further resources when emulating it.
>>>> Only KVM has special needs.
>>>>
>>>> Jan
>>>>
>>>
>>> It's not hard to speculate.  Imagine an out of process device that
>>> shares guest memory and sends interrupts to qemu using eventfd. Suddenly
>>> we need an fd per vector, and this without KVM.
>>
>> That's what irqfd was invented for. Already works for vhost, and there
>> is nothing that prevents communicating the irqfd fd between two
>> processes. But note: irqfd handle, not a KVM-internal GSI.
>>
>> Jan
>>
> 
> Yes. But this still makes an API for acquiring per-vector resources a requirement.

Yes, but a different one than current use/unuse. And it will be an
optional one, only for those devices that need to establish irq/eventfd
channels.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html