Re: [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors

"Michael S. Tsirkin" <mst@xxxxxxxxxx> · Tue, 18 Oct 2011 23:40:06 +0200

On Tue, Oct 18, 2011 at 09:37:14PM +0200, Jan Kiszka wrote:
> On 2011-10-18 20:40, Michael S. Tsirkin wrote:
> > On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote:
> >> On 2011-10-18 19:06, Michael S. Tsirkin wrote:
> >>> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote:
> >>>> On 2011-10-18 17:22, Jan Kiszka wrote:
> >>>>> What KVM has to do is just mapping an arbitrary MSI message
> >>>>> (theoretically 64+32 bits, in practice it's much of course much less) to
> >>>>
> >>>> ( There are 24 distinguishing bits in an MSI message on x86, but that's
> >>>> only a current interpretation of one specific arch. )
> >>>
> >>> Confused. vector mask is 8 bits. the rest is destination id etc.
> >>
> >> Right, but those additional bits like the destination make different
> >> messages. We have to encode those 24 bits into a unique GSI number and
> >> restore them (by table lookup) on APIC injection inside the kernel. If
> >> we only had to encode 256 different vectors, we would be done already.
> > 
> > Right. But in practice guests always use distinct vectors (from the
> > 256 available) for distinct messages. This is because
> > the vector seems to be the only thing that gets communicated by the APIC
> > to the software.
> > 
> > So e.g. a table with 256 entries, with extra 1024-256
> > used for spill-over for guests that do something unexpected,
> > would work really well.
> 
> Already Linux manages vectors on a pre-CPU basis. For efficiency
> reasons, it does not exploit the full range of 256 vectors but actually
> allocates them in - IIRC - steps of 16. So I would not be surprised to
> find lots of vector number "collisions" when looking over a full set of
> CPUs in a system.
> 
> Really, these considerations do not help us. We must store all 96 bits,
> already for the sake of other KVM architectures that want MSI routing.
> > 
> > 
> >>>
> >>>>> a single GSI and vice versa. As there are less GSIs than possible MSI
> >>>>> messages, we could run out of them when creating routes, statically or
> >>>>> lazily.
> >>>>>
> >>>>> What would probably help us long-term out of your concerns regarding
> >>>>> lazy routing is to bypass that redundant GSI translation for dynamic
> >>>>> messages, i.e. those that are not associated with an irqfd number or an
> >>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts
> >>>>> address and data directly.
> >>>>
> >>>> This would be a trivial extension in fact. Given its beneficial impact
> >>>> on our GSI limitation issue, I think I will hack up something like that.
> >>>>
> >>>> And maybe this makes a transparent cache more reasonable. Then only old
> >>>> host kernels would force us to do searches for already cached messages.
> >>>>
> >>>> Jan
> >>>
> >>> Hmm, I'm not all that sure. Existing design really allows
> >>> caching the route in various smart ways. We currently do
> >>> this for irqfd but this can be extended to ioctls.
> >>> If we just let the guest inject arbitrary messages,
> >>> that becomes much more complex.
> >>
> >> irqfd and kvm device assignment do not allow us to inject arbitrary
> >> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and
> >> kvm_device_msix_set_vector (etc.) for those scenarios to set static
> >> routes from an MSI message to a GSI number (+they configure the related
> >> backends).
> > 
> > Yes, it's a very flexible API but it would be very hard to optimize.
> > GSIs let us do the slow path setup, but they make it easy
> > to optimize target lookup in kernel.
> 
> Users of the API above have no need to know anything about GSIs. They
> are an artifact of the KVM-internal interface between user space and
> kernel now - thanks to the MSIRoutingCache encapsulation.

Yes but I am saying that the API above can't be implemented
more efficiently than now: you will have to scan all apics on each MSI.
The GSI implementation can be optimized: decode the vector once,
if it matches a single vcpu, store that vcpu and use when sending
interrupts.

> > 
> > An analogy would be if read/write operated on file paths.
> > fd makes it easy to do permission checks and slow lookups
> > in one place. GSI happens to work like this (maybe, by accident).
> 
> Think of an opaque file handle as a MSIRoutingCache object. And it
> encodes not only the routing handle but also other useful associated
> information we need from time to time - internally, not in the device
> models.

Forget qemu abstractions, I am talking about data path
optimizations in kernel in kvm. From that POV the point of an fd is not
that it is opaque. It is that it's an index in an array that
can be used for fast lookups.

> >>>
> >>> Another concern is mask bit emulation. We currently
> >>> handle mask bit in userspace but patches
> >>> to do them in kernel for assigned devices where seen
> >>> and IMO we might want to do that for virtio as well.
> >>>
> >>> For that to work the mask bit needs to be tied to
> >>> a specific gsi or specific device, which does not
> >>> work if we just inject arbitrary writes.
> >>
> >> Yes, but I do not see those valuable plans being negatively affected.
> >>
> >> Jan
> >>
> > 
> > I do.
> > How would we maintain a mask/pending bit in kernel if we are not
> > supplied info on all available vectors even?
> 
> It's tricky to discuss an undefined interface (there only exists an
> outdated proposal for kvm device assignment). But I suppose that user
> space will have to define the maximum number of vectors when creating an
> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init.
> 
> The number of used vectors will correlate with the number of registered
> irqfds (in the case of vhost or vfio, device assignment still has
> SET_MSIX_NR). As kernel space would then be responsible for mask
> processing, user space would keep vectors registered with irqfds, even
> if they are masked. It could just continue to play the trick and drop
> data=0 vectors.

Which trick?  We don't play any tricks except for device assignment.

> The point here is: All those steps have _nothing_ to do with the generic
> MSI-X core. They are KVM-specific "side channels" for which KVM provides
> an API. In contrast, msix_vector_use/unuse were generic services that
> were actually created to please KVM requirements. But if we split that
> up, we can address the generic MSI-X requirements in a way that makes
> more sense for emulated devices (and particularly msix_vector_use makes
> no sense for emulation).
> 
> Jan
> 

We need at least msix_vector_unuse - IMO it makes more sense than "clear
pending vector". msix_vector_use is good to keep around for symmetry:
who knows whether we'll need to allocate resources per vector
in the future.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html