Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors

Jan Kiszka <jan.kiszka@xxxxxx> · Tue, 18 Oct 2011 21:37:14 +0200

On 2011-10-18 20:40, Michael S. Tsirkin wrote:
> On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote:
>> On 2011-10-18 19:06, Michael S. Tsirkin wrote:
>>> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
>>>> On 2011-10-18 17:22, Jan Kiszka wrote:
>>>>> What KVM has to do is just mapping an arbitrary MSI message
>>>>> (theoretically 64+32 bits, in practice it's much of course much less) to
>>>>
>>>> ( There are 24 distinguishing bits in an MSI message on x86, but that's
>>>> only a current interpretation of one specific arch. )
>>>
>>> Confused. vector mask is 8 bits. the rest is destination id etc.
>>
>> Right, but those additional bits like the destination make different
>> messages. We have to encode those 24 bits into a unique GSI number and
>> restore them (by table lookup) on APIC injection inside the kernel. If
>> we only had to encode 256 different vectors, we would be done already.
> 
> Right. But in practice guests always use distinct vectors (from the
> 256 available) for distinct messages. This is because
> the vector seems to be the only thing that gets communicated by the APIC
> to the software.
> 
> So e.g. a table with 256 entries, with extra 1024-256
> used for spill-over for guests that do something unexpected,
> would work really well.

Already Linux manages vectors on a pre-CPU basis. For efficiency
reasons, it does not exploit the full range of 256 vectors but actually
allocates them in - IIRC - steps of 16. So I would not be surprised to
find lots of vector number "collisions" when looking over a full set of
CPUs in a system.

Really, these considerations do not help us. We must store all 96 bits,
already for the sake of other KVM architectures that want MSI routing.

> 
> 
>>>
>>>>> a single GSI and vice versa. As there are less GSIs than possible MSI
>>>>> messages, we could run out of them when creating routes, statically or
>>>>> lazily.
>>>>>
>>>>> What would probably help us long-term out of your concerns regarding
>>>>> lazy routing is to bypass that redundant GSI translation for dynamic
>>>>> messages, i.e. those that are not associated with an irqfd number or an
>>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
>>>>> address and data directly.
>>>>
>>>> This would be a trivial extension in fact. Given its beneficial impact
>>>> on our GSI limitation issue, I think I will hack up something like that.
>>>>
>>>> And maybe this makes a transparent cache more reasonable. Then only old
>>>> host kernels would force us to do searches for already cached messages.
>>>>
>>>> Jan
>>>
>>> Hmm, I'm not all that sure. Existing design really allows
>>> caching the route in various smart ways. We currently do
>>> this for irqfd but this can be extended to ioctls.
>>> If we just let the guest inject arbitrary messages,
>>> that becomes much more complex.
>>
>> irqfd and kvm device assignment do not allow us to inject arbitrary
>> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
>> kvm_device_msix_set_vector (etc.) for those scenarios to set static
>> routes from an MSI message to a GSI number (+they configure the related
>> backends).
> 
> Yes, it's a very flexible API but it would be very hard to optimize.
> GSIs let us do the slow path setup, but they make it easy
> to optimize target lookup in kernel.

Users of the API above have no need to know anything about GSIs. They
are an artifact of the KVM-internal interface between user space and
kernel now - thanks to the MSIRoutingCache encapsulation.

> 
> An analogy would be if read/write operated on file paths.
> fd makes it easy to do permission checks and slow lookups
> in one place. GSI happens to work like this (maybe, by accident).

Think of an opaque file handle as a MSIRoutingCache object. And it
encodes not only the routing handle but also other useful associated
information we need from time to time - internally, not in the device
models.

>>>
>>> Another concern is mask bit emulation. We currently
>>> handle mask bit in userspace but patches
>>> to do them in kernel for assigned devices where seen
>>> and IMO we might want to do that for virtio as well.
>>>
>>> For that to work the mask bit needs to be tied to
>>> a specific gsi or specific device, which does not
>>> work if we just inject arbitrary writes.
>>
>> Yes, but I do not see those valuable plans being negatively affected.
>>
>> Jan
>>
> 
> I do.
> How would we maintain a mask/pending bit in kernel if we are not
> supplied info on all available vectors even?

It's tricky to discuss an undefined interface (there only exists an
outdated proposal for kvm device assignment). But I suppose that user
space will have to define the maximum number of vectors when creating an
in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.

The number of used vectors will correlate with the number of registered
irqfds (in the case of vhost or vfio, device assignment still has
SET_MSIX_NR). As kernel space would then be responsible for mask
processing, user space would keep vectors registered with irqfds, even
if they are masked. It could just continue to play the trick and drop
data=0 vectors.

The point here is: All those steps have _nothing_ to do with the generic
MSI-X core. They are KVM-specific "side channels" for which KVM provides
an API. In contrast, msix_vector_use/unuse were generic services that
were actually created to please KVM requirements. But if we split that
up, we can address the generic MSI-X requirements in a way that makes
more sense for emulated devices (and particularly msix_vector_use makes
no sense for emulation).

Jan

Attachment:
signature.asc

Description: OpenPGP digital signature