On Wed, Oct 19, 2011 at 08:41:48AM +0200, Jan Kiszka wrote: > >>>>>>>>> a single GSI and vice versa. As there are less GSIs than possible MSI > >>>>>>>>> messages, we could run out of them when creating routes, statically or > >>>>>>>>> lazily. > >>>>>>>>> > >>>>>>>>> What would probably help us long-term out of your concerns regarding > >>>>>>>>> lazy routing is to bypass that redundant GSI translation for dynamic > >>>>>>>>> messages, i.e. those that are not associated with an irqfd number or an > >>>>>>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts > >>>>>>>>> address and data directly. > >>>>>>>> > >>>>>>>> This would be a trivial extension in fact. Given its beneficial impact > >>>>>>>> on our GSI limitation issue, I think I will hack up something like that. > >>>>>>>> > >>>>>>>> And maybe this makes a transparent cache more reasonable. Then only old > >>>>>>>> host kernels would force us to do searches for already cached messages. > >>>>>>>> > >>>>>>>> Jan > >>>>>>> > >>>>>>> Hmm, I'm not all that sure. Existing design really allows > >>>>>>> caching the route in various smart ways. We currently do > >>>>>>> this for irqfd but this can be extended to ioctls. > >>>>>>> If we just let the guest inject arbitrary messages, > >>>>>>> that becomes much more complex. > >>>>>> > >>>>>> irqfd and kvm device assignment do not allow us to inject arbitrary > >>>>>> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and > >>>>>> kvm_device_msix_set_vector (etc.) for those scenarios to set static > >>>>>> routes from an MSI message to a GSI number (+they configure the related > >>>>>> backends). > >>>>> > >>>>> Yes, it's a very flexible API but it would be very hard to optimize. > >>>>> GSIs let us do the slow path setup, but they make it easy > >>>>> to optimize target lookup in kernel. > >>>> > >>>> Users of the API above have no need to know anything about GSIs. They > >>>> are an artifact of the KVM-internal interface between user space and > >>>> kernel now - thanks to the MSIRoutingCache encapsulation. > >>> > >>> Yes but I am saying that the API above can't be implemented > >>> more efficiently than now: you will have to scan all apics on each MSI. > >>> The GSI implementation can be optimized: decode the vector once, > >>> if it matches a single vcpu, store that vcpu and use when sending > >>> interrupts. > >> > >> Sorry, missed that you switched to kernel. > >> > >> What information do you want to cache there that cannot be easily > >> obtained by looking at a concrete message? I do not see any. Once you > >> checked that the delivery mode targets a specific cpu, you could address > >> it directly. > > > > I thought we need to match APIC ID. That needs a table lookup, no? > > Yes. But that's completely independent of a concrete MSI message. In > fact, this is the same thing we need when interpreting an IOAPIC > redirection table entry. So let's create an APIC ID lookup table for the > destination ID field, maybe multiple of them to match different modes, > but not a MSI message table. > > > >> Or are you thinking about some cluster mode? > > > > That too. Hmm, might be a good idea. APIC IDs are 8 bit, right? > >>> > >>> > >>>>> > >>>>> An analogy would be if read/write operated on file paths. > >>>>> fd makes it easy to do permission checks and slow lookups > >>>>> in one place. GSI happens to work like this (maybe, by accident). > >>>> > >>>> Think of an opaque file handle as a MSIRoutingCache object. And it > >>>> encodes not only the routing handle but also other useful associated > >>>> information we need from time to time - internally, not in the device > >>>> models. > >>> > >>> Forget qemu abstractions, I am talking about data path > >>> optimizations in kernel in kvm. From that POV the point of an fd is not > >>> that it is opaque. It is that it's an index in an array that > >>> can be used for fast lookups. > >>> > >>>>>>> > >>>>>>> Another concern is mask bit emulation. We currently > >>>>>>> handle mask bit in userspace but patches > >>>>>>> to do them in kernel for assigned devices where seen > >>>>>>> and IMO we might want to do that for virtio as well. > >>>>>>> > >>>>>>> For that to work the mask bit needs to be tied to > >>>>>>> a specific gsi or specific device, which does not > >>>>>>> work if we just inject arbitrary writes. > >>>>>> > >>>>>> Yes, but I do not see those valuable plans being negatively affected. > >>>>>> > >>>>>> Jan > >>>>>> > >>>>> > >>>>> I do. > >>>>> How would we maintain a mask/pending bit in kernel if we are not > >>>>> supplied info on all available vectors even? > >>>> > >>>> It's tricky to discuss an undefined interface (there only exists an > >>>> outdated proposal for kvm device assignment). But I suppose that user > >>>> space will have to define the maximum number of vectors when creating an > >>>> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init. > >>>> > >>>> The number of used vectors will correlate with the number of registered > >>>> irqfds (in the case of vhost or vfio, device assignment still has > >>>> SET_MSIX_NR). As kernel space would then be responsible for mask > >>>> processing, user space would keep vectors registered with irqfds, even > >>>> if they are masked. It could just continue to play the trick and drop > >>>> data=0 vectors. > >>> > >>> Which trick? We don't play any tricks except for device assignment. > >>> > >>>> The point here is: All those steps have _nothing_ to do with the generic > >>>> MSI-X core. They are KVM-specific "side channels" for which KVM provides > >>>> an API. In contrast, msix_vector_use/unuse were generic services that > >>>> were actually created to please KVM requirements. But if we split that > >>>> up, we can address the generic MSI-X requirements in a way that makes > >>>> more sense for emulated devices (and particularly msix_vector_use makes > >>>> no sense for emulation). > >>>> > >>>> Jan > >>>> > >>> > >>> We need at least msix_vector_unuse > >> > >> Not at all. We rather need some qemu_irq_set(level) for MSI. > >> The spec > >> requires that the device clears pending when the reason for that is > >> removed. And any removal that is device model-originated should simply > >> be signaled like an irq de-assert. > > > > OK, this is a good argument. > > In particular virtio ISR read could clear msix pending bit > > (note: it would also need to clear irqfd as that is where > > we get the pending bit). > > > > I would prefer not to use qemu_irq_set for this though. > > We can add a level flag to msix_notify. > > No concerns. > > > > >> Vector "unusage" is just one reason here. > > > > I don't see removing the use/unuse functions as a priority though, > > but if we add an API that also lets devices say > > 'reason for interrupt is removed', that would be nice. > > > > Removing extra code can then be done separately, and on qemu.git > > not on qemu-kvm. > > If we refrain from hacking KVM logic into the use/unuse services > upstream, we can do this later on. For me it is important that those > obsolete services do not block or complicate further cleanups of the MSI > layer nor bother device model creators with tasks they should not worry > about. My assumption is devices shall keep calling use/unuse until we drop it. Does not seem like a major bother. If you like, use all vectors or just those with message != 0. > > > >>> - IMO it makes more sense than "clear > >>> pending vector". msix_vector_use is good to keep around for symmetry: > >>> who knows whether we'll need to allocate resources per vector > >>> in the future. > >> > >> For MSI[-X], the spec is already there, and we know that there no need > >> for further resources when emulating it. > >> Only KVM has special needs. > >> > >> Jan > >> > > > > It's not hard to speculate. Imagine an out of process device that > > shares guest memory and sends interrupts to qemu using eventfd. Suddenly > > we need an fd per vector, and this without KVM. > > That's what irqfd was invented for. Already works for vhost, and there > is nothing that prevents communicating the irqfd fd between two > processes. But note: irqfd handle, not a KVM-internal GSI. > > Jan > Yes. But this still makes an API for acquiring per-vector resources a requirement. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html