On Fri, 2012-06-08 at 16:43 +0200, Jan Kiszka wrote: > On 2012-06-08 16:00, Alexey Kardashevskiy wrote: > > 08.06.2012 21:30, Jan Kiszka пишет: > >> On 2012-06-08 13:16, Alexey Kardashevskiy wrote: > >>> 08.06.2012 20:56, Jan Kiszka написал: > >>>> On 2012-06-08 10:47, Alexey Kardashevskiy wrote: > >>>>> Yet another try :) > >>>>> > >>>>> Normally the pci_add_capability is called on devices to add new > >>>>> capability. This is ok for emulated devices which capabilities list > >>>>> is being built by QEMU. > >>>>> > >>>>> In the case of VFIO the capability may already exist and adding new > >>>> > >>>> Why does it exit? VFIO should build the virtual capability list from > >>>> scratch (just like classic device assignment does), recreating the > >>>> layout of the physical device (except for masked out caps). In that > >>>> case, this conflict should become impossible, no? > >>> > >>> Normally capabilities in emulated devices are created by calling > >>> msi_init or msix_init - just when emulated device wants to advertise it > >>> to the guest. > >>> > >>> In the case of VFIO, there is a lot of capabilities which QEMU does not > >>> know and does not want to know about. They are read from the host kernel > >>> as is. And we definitely want to pass these capabilities to the guest as > >>> is, i.e. on the same position and the same number of them. Just for some > >>> we call pci_add_capability (indirectly!) if we want QEMU to support them > >>> somehow. > >>> > >>> If we invent some function which "readds" all the capabilities we got > >>> from the host to keep internal QEMU's PCIDevice data in sync, then we'll > >>> need to change every piece of code which adds capabilities. > >> > >> I can't follow. What is different in VFIO from device-assignment.c, > >> assigned_device_pci_cap_init (except that it already uses msi[x]_init, > >> something we need to fix in device-assignment.c)? > > > > What are device-assignment.c and assigned_device_pci_cap_init? Cannot > > find them in QEMU tree. > > "Old-style" KVM device assignment is not yet upstream. You can find it > in qemu-kvm, hopefully in upstream soon as well. > > > > > Ah, anyway. The main difference is QEMU does not emulate VFIO devices, > > it just a proxy to the host system. Or I do not understand the question. > > > >>> I noticed, > >>> this is very common approach here to change a lot for a very small thing > >>> or rare case but I'd like to avoid this :) > >>> > >>>> But if pci_*add*_capability should actually be used like this (I doubt > >>>> this), > >>> > >>> MSI/MSIX use it. To enable MSI/MSIX on VFIO PCIDevice, we call > >>> msi_init/msix_init and they call pci_add_capability. > >> > >> You can't blame msi_init/msix_init for the fact that VFIO creates a > >> capability list with an existing MSI/MSI-X entry beforehand. > > > > VFIO does not create any capability. It gets them all from the host > > kernel and passes to the guest as is. VFIO only needs MSIX to be enabled > > in VFIO. > > Just like any device in QEMU, also VFIO need to set up a virtual config > space when it registers with the PCI core layer. Even if the virtual one > is modeled after the real one, it is still _created_ by the VFIO > userspace part. And this creation process is obviously a bit messed up > so far. Fix this, but not by adding workarounds in the MSI or PCI layer. > Rather add all capabilities you want to expose to the guest via > pci_add_capability or, indirectly, via msi[x]_init at the right > position. Do not just copy the real config space over, that breaks the > core layer as we see. The difference between VFIO and kvm device assignment is that VFIO emulates a lot of config space for us, so most things are passed through. MSI and MSIX are unique that we actually do want the qemu support for helping us to manage them. So we're basically not telling qemu about anything other than these, and for the most part, that works since qemu never handles access to the other capabilities. However, I think you're probably right, VFIO should just walk the capabilities list, registering each with qemu. It's a little "unnecessary" overhead from the VFIO perspective, but it makes the VFIO device less unique. I'll work on adding this. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html