On 2011-10-18 15:37, Michael S. Tsirkin wrote: > On Tue, Oct 18, 2011 at 03:00:29PM +0200, Jan Kiszka wrote: >> On 2011-10-18 14:48, Michael S. Tsirkin wrote: >>>> To my understanding, virtio will be the exception as no other device >>>> will have a chance to react on resource shortage while sending(!) an MSI >>>> message. >>> >>> Hmm, are you familiar with that spec? >> >> Not by heart. >> >>> This is not what virtio does, >>> resource shortage is detected during setup. >>> This is exactly the problem with lazy registration as you don't >>> allocate until it's too late. >> >> When is that setup phase? Does it actually come after every change to an >> MSI vector? I doubt so. > > No. During setup, driver requests vectors from the OS, and then tells > the device which vector should each VQ use. It then checks that the > assignment was successful. If not, it retries with less vectors. > > Other devices can do this during initialization, and signal > resource availability to guest using msix vector number field. > >> Thus virtio can only estimate the guest usage as >> well > > At some level, this is fundamental: some guest operations > have no failure mode. So we must preallocate > some resources to make sure they won't fail. We can still track the expected maximum number of active vectors at core level, collect them from the KVM layer, and warn if we expect conflicts. Anxious MSI users could then refrain from using this feature, others might be fine with risking a slow-down on conflicts. > >> (a guest may or may not actually write a non-null data into a >> vector and unmask it). > > Please, forget the non-NULL thing. virtio driver knows exactly > how many vectors we use and communicates this info to the device. > This is not uncommon at all. > >>> >>>>> >>>>> I actually would not mind preallocating everything upfront which is much >>>>> easier. But with your patch we get a silent failure or a drastic >>>>> slowdown which is much more painful IMO. >>>> >>>> Again: did we already saw that limit? And where does it come from if not >>>> from KVM? >>> >>> It's a hardware limitation of intel APICs. interrupt vector is encoded >>> in an 8 bit field in msi address. So you can have at most 256 of these. >> >> There should be no such limitation with pseudo GSIs we use for MSI >> injection. They end up as MSI messages again, so actually 256 (-reserved >> vectors) * number-of-cpus (on x86). > > This limits which CPUs can get the interrupt though. > Linux seems to have a global pool as it wants to be able to freely > balance vectors between CPUs. Or, consider a guest with a single CPU :) > > Anyway, why argue - there is a limitation, and it's not coming from KVM, > right? No, our limit we hit with MSI message routing are first of all KVM GSIs, and there only pseudo GSIs that do not go to any interrupt controller with limited pins. That could easily be lifted in the kernel if we run into shortages in practice. > >>> >>>>> >>>>>> That's also why we do those data == 0 >>>>>> checks to skip used but unconfigured vectors. >>>>>> >>>>>> Jan >>>>> >>>>> These checks work more or less by luck BTW. It's >>>>> a hack which I hope lazy allocation will replace. >>>> >>>> The check is still valid (for x86) when we have to use static routes >>>> (device assignment, vhost). >>> >>> It's not valid at all - we are just lucky that linux and >>> windows guests seem to zero out the vector when it's not in use. >>> They do not have to do that. >> >> It is valid as it is just an optimization. If an unused vector has a >> non-null data field, we just redundantly register a route where we do >> not actually have to. > > Well, the only reason we even have this code is because > it was claimed that some devices declare support for a huge number > of vectors which then go unused. So if the guest does not > do this we'll run out of vectors ... > >> But we do need to be prepared > > And ATM, we aren't, and probably can't be without kernel > changes, right? > >> for potentially >> arriving messages on that virtual GSI, either via irqfd or kvm device >> assignment. >> >> Jan > > Why irqfd? Device assignment is ATM the only place where we use these > ugly hacks. vfio will use irqfds. And that virtio is partly out of the picture is only because we know much more about virtio internals (specifically: "will not advertise more vectors than guests will want to use"). Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html