On 2011-10-18 23:40, Michael S. Tsirkin wrote: > On Tue, Oct 18, 2011 at 09:37:14PM +0200, Jan Kiszka wrote: >> On 2011-10-18 20:40, Michael S. Tsirkin wrote: >>> On Tue, Oct 18, 2011 at 08:24:39PM +0200, Jan Kiszka wrote: >>>> On 2011-10-18 19:06, Michael S. Tsirkin wrote: >>>>> On Tue, Oct 18, 2011 at 05:55:54PM +0200, Jan Kiszka wrote: >>>>>> On 2011-10-18 17:22, Jan Kiszka wrote: >>>>>>> What KVM has to do is just mapping an arbitrary MSI message >>>>>>> (theoretically 64+32 bits, in practice it's much of course much less) to >>>>>> >>>>>> ( There are 24 distinguishing bits in an MSI message on x86, but that's >>>>>> only a current interpretation of one specific arch. ) >>>>> >>>>> Confused. vector mask is 8 bits. the rest is destination id etc. >>>> >>>> Right, but those additional bits like the destination make different >>>> messages. We have to encode those 24 bits into a unique GSI number and >>>> restore them (by table lookup) on APIC injection inside the kernel. If >>>> we only had to encode 256 different vectors, we would be done already. >>> >>> Right. But in practice guests always use distinct vectors (from the >>> 256 available) for distinct messages. This is because >>> the vector seems to be the only thing that gets communicated by the APIC >>> to the software. >>> >>> So e.g. a table with 256 entries, with extra 1024-256 >>> used for spill-over for guests that do something unexpected, >>> would work really well. >> >> Already Linux manages vectors on a pre-CPU basis. For efficiency >> reasons, it does not exploit the full range of 256 vectors but actually >> allocates them in - IIRC - steps of 16. So I would not be surprised to >> find lots of vector number "collisions" when looking over a full set of >> CPUs in a system. >> >> Really, these considerations do not help us. We must store all 96 bits, >> already for the sake of other KVM architectures that want MSI routing. >>> >>> >>>>> >>>>>>> a single GSI and vice versa. As there are less GSIs than possible MSI >>>>>>> messages, we could run out of them when creating routes, statically or >>>>>>> lazily. >>>>>>> >>>>>>> What would probably help us long-term out of your concerns regarding >>>>>>> lazy routing is to bypass that redundant GSI translation for dynamic >>>>>>> messages, i.e. those that are not associated with an irqfd number or an >>>>>>> assigned device irq. Something like a KVM_DELIVER_MSI IOCTL that accepts >>>>>>> address and data directly. >>>>>> >>>>>> This would be a trivial extension in fact. Given its beneficial impact >>>>>> on our GSI limitation issue, I think I will hack up something like that. >>>>>> >>>>>> And maybe this makes a transparent cache more reasonable. Then only old >>>>>> host kernels would force us to do searches for already cached messages. >>>>>> >>>>>> Jan >>>>> >>>>> Hmm, I'm not all that sure. Existing design really allows >>>>> caching the route in various smart ways. We currently do >>>>> this for irqfd but this can be extended to ioctls. >>>>> If we just let the guest inject arbitrary messages, >>>>> that becomes much more complex. >>>> >>>> irqfd and kvm device assignment do not allow us to inject arbitrary >>>> messages at arbitrary points. The new API offers kvm_msi_irqfd_set and >>>> kvm_device_msix_set_vector (etc.) for those scenarios to set static >>>> routes from an MSI message to a GSI number (+they configure the related >>>> backends). >>> >>> Yes, it's a very flexible API but it would be very hard to optimize. >>> GSIs let us do the slow path setup, but they make it easy >>> to optimize target lookup in kernel. >> >> Users of the API above have no need to know anything about GSIs. They >> are an artifact of the KVM-internal interface between user space and >> kernel now - thanks to the MSIRoutingCache encapsulation. > > Yes but I am saying that the API above can't be implemented > more efficiently than now: you will have to scan all apics on each MSI. > The GSI implementation can be optimized: decode the vector once, > if it matches a single vcpu, store that vcpu and use when sending > interrupts. Sorry, missed that you switched to kernel. What information do you want to cache there that cannot be easily obtained by looking at a concrete message? I do not see any. Once you checked that the delivery mode targets a specific cpu, you could address it directly. Or are you thinking about some cluster mode? > > >>> >>> An analogy would be if read/write operated on file paths. >>> fd makes it easy to do permission checks and slow lookups >>> in one place. GSI happens to work like this (maybe, by accident). >> >> Think of an opaque file handle as a MSIRoutingCache object. And it >> encodes not only the routing handle but also other useful associated >> information we need from time to time - internally, not in the device >> models. > > Forget qemu abstractions, I am talking about data path > optimizations in kernel in kvm. From that POV the point of an fd is not > that it is opaque. It is that it's an index in an array that > can be used for fast lookups. > >>>>> >>>>> Another concern is mask bit emulation. We currently >>>>> handle mask bit in userspace but patches >>>>> to do them in kernel for assigned devices where seen >>>>> and IMO we might want to do that for virtio as well. >>>>> >>>>> For that to work the mask bit needs to be tied to >>>>> a specific gsi or specific device, which does not >>>>> work if we just inject arbitrary writes. >>>> >>>> Yes, but I do not see those valuable plans being negatively affected. >>>> >>>> Jan >>>> >>> >>> I do. >>> How would we maintain a mask/pending bit in kernel if we are not >>> supplied info on all available vectors even? >> >> It's tricky to discuss an undefined interface (there only exists an >> outdated proposal for kvm device assignment). But I suppose that user >> space will have to define the maximum number of vectors when creating an >> in-kernel MSI-X MMIO area. The device already has to tell this to msix_init. >> >> The number of used vectors will correlate with the number of registered >> irqfds (in the case of vhost or vfio, device assignment still has >> SET_MSIX_NR). As kernel space would then be responsible for mask >> processing, user space would keep vectors registered with irqfds, even >> if they are masked. It could just continue to play the trick and drop >> data=0 vectors. > > Which trick? We don't play any tricks except for device assignment. > >> The point here is: All those steps have _nothing_ to do with the generic >> MSI-X core. They are KVM-specific "side channels" for which KVM provides >> an API. In contrast, msix_vector_use/unuse were generic services that >> were actually created to please KVM requirements. But if we split that >> up, we can address the generic MSI-X requirements in a way that makes >> more sense for emulated devices (and particularly msix_vector_use makes >> no sense for emulation). >> >> Jan >> > > We need at least msix_vector_unuse Not at all. We rather need some qemu_irq_set(level) for MSI. The spec requires that the device clears pending when the reason for that is removed. And any removal that is device model-originated should simply be signaled like an irq de-assert. Vector "unusage" is just one reason here. > - IMO it makes more sense than "clear > pending vector". msix_vector_use is good to keep around for symmetry: > who knows whether we'll need to allocate resources per vector > in the future. For MSI[-X], the spec is already there, and we know that there no need for further resources when emulating it. Only KVM has special needs. Jan
Attachment:
signature.asc
Description: OpenPGP digital signature