On Tue, Oct 18, 2011 at 03:46:06PM +0200, Jan Kiszka wrote: > On 2011-10-18 15:37, Michael S. Tsirkin wrote: > > On Tue, Oct 18, 2011 at 03:00:29PM +0200, Jan Kiszka wrote: > >> On 2011-10-18 14:48, Michael S. Tsirkin wrote: > >>>> To my understanding, virtio will be the exception as no other device > >>>> will have a chance to react on resource shortage while sending(!) an MSI > >>>> message. > >>> > >>> Hmm, are you familiar with that spec? > >> > >> Not by heart. > >> > >>> This is not what virtio does, > >>> resource shortage is detected during setup. > >>> This is exactly the problem with lazy registration as you don't > >>> allocate until it's too late. > >> > >> When is that setup phase? Does it actually come after every change to an > >> MSI vector? I doubt so. > > > > No. During setup, driver requests vectors from the OS, and then tells > > the device which vector should each VQ use. It then checks that the > > assignment was successful. If not, it retries with less vectors. > > > > Other devices can do this during initialization, and signal > > resource availability to guest using msix vector number field. > > > >> Thus virtio can only estimate the guest usage as > >> well > > > > At some level, this is fundamental: some guest operations > > have no failure mode. So we must preallocate > > some resources to make sure they won't fail. > > We can still track the expected maximum number of active vectors at core > level, collect them from the KVM layer, and warn if we expect conflicts. > Anxious MSI users could then refrain from using this feature, others > might be fine with risking a slow-down on conflicts. It seems like a nice feature until you have to debug it in the field :). If you really think it's worthwhile, let's add a 'force' flag so that advanced users at least can declare that they know what they are doing. > > > >> (a guest may or may not actually write a non-null data into a > >> vector and unmask it). > > > > Please, forget the non-NULL thing. virtio driver knows exactly > > how many vectors we use and communicates this info to the device. > > This is not uncommon at all. > > > >>> > >>>>> > >>>>> I actually would not mind preallocating everything upfront which is much > >>>>> easier. But with your patch we get a silent failure or a drastic > >>>>> slowdown which is much more painful IMO. > >>>> > >>>> Again: did we already saw that limit? And where does it come from if not > >>>> from KVM? > >>> > >>> It's a hardware limitation of intel APICs. interrupt vector is encoded > >>> in an 8 bit field in msi address. So you can have at most 256 of these. > >> > >> There should be no such limitation with pseudo GSIs we use for MSI > >> injection. They end up as MSI messages again, so actually 256 (-reserved > >> vectors) * number-of-cpus (on x86). > > > > This limits which CPUs can get the interrupt though. > > Linux seems to have a global pool as it wants to be able to freely > > balance vectors between CPUs. Or, consider a guest with a single CPU :) > > > > Anyway, why argue - there is a limitation, and it's not coming from KVM, > > right? > > No, our limit we hit with MSI message routing are first of all KVM GSIs, > and there only pseudo GSIs that do not go to any interrupt controller > with limited pins. I see KVM_MAX_IRQ_ROUTES 1024 This is > 256 so KVM does not seem to be the problem. > That could easily be lifted in the kernel if we run > into shortages in practice. What I was saying is that resources are limited even without kvm. > > > >>> > >>>>> > >>>>>> That's also why we do those data == 0 > >>>>>> checks to skip used but unconfigured vectors. > >>>>>> > >>>>>> Jan > >>>>> > >>>>> These checks work more or less by luck BTW. It's > >>>>> a hack which I hope lazy allocation will replace. > >>>> > >>>> The check is still valid (for x86) when we have to use static routes > >>>> (device assignment, vhost). > >>> > >>> It's not valid at all - we are just lucky that linux and > >>> windows guests seem to zero out the vector when it's not in use. > >>> They do not have to do that. > >> > >> It is valid as it is just an optimization. If an unused vector has a > >> non-null data field, we just redundantly register a route where we do > >> not actually have to. > > > > Well, the only reason we even have this code is because > > it was claimed that some devices declare support for a huge number > > of vectors which then go unused. So if the guest does not > > do this we'll run out of vectors ... > > > >> But we do need to be prepared > > > > And ATM, we aren't, and probably can't be without kernel > > changes, right? > > > >> for potentially > >> arriving messages on that virtual GSI, either via irqfd or kvm device > >> assignment. > >> > >> Jan > > > > Why irqfd? Device assignment is ATM the only place where we use these > > ugly hacks. > > vfio will use irqfds. And that virtio is partly out of the picture is > only because we know much more about virtio internals (specifically: > "will not advertise more vectors than guests will want to use"). > > Jan > > -- > Siemens AG, Corporate Technology, CT T DE IT 1 > Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html