On Wed, 2011-08-24 at 09:51 +1000, Benjamin Herrenschmidt wrote: > > > For us the most simple and logical approach (which is also what pHyp > > > uses and what Linux handles well) is really to expose a given PCI host > > > bridge per group to the guest. Believe it or not, it makes things > > > easier :-) > > > > I'm all for easier. Why does exposing the bridge use less bus numbers > > than emulating a bridge? > > Because a host bridge doesn't look like a PCI to PCI bridge at all for > us. It's an entire separate domain with it's own bus number space > (unlike most x86 setups). Ok, I missed the "host" bridge. > In fact we have some problems afaik in qemu today with the concept of > PCI domains, for example, I think qemu has assumptions about a single > shared IO space domain which isn't true for us (each PCI host bridge > provides a distinct IO space domain starting at 0). We'll have to fix > that, but it's not a huge deal. Yep, I've seen similar on ia64 systems. > So for each "group" we'd expose in the guest an entire separate PCI > domain space with its own IO, MMIO etc... spaces, handed off from a > single device-tree "host bridge" which doesn't itself appear in the > config space, doesn't need any emulation of any config space etc... > > > On x86, I want to maintain that our default assignment is at the device > > level. A user should be able to pick single or multiple devices from > > across several groups and have them all show up as individual, > > hotpluggable devices on bus 0 in the guest. Not surprisingly, we've > > also seen cases where users try to attach a bridge to the guest, > > assuming they'll get all the devices below the bridge, so I'd be in > > favor of making this "just work" if possible too, though we may have to > > prevent hotplug of those. > > > > Given the device requirement on x86 and since everything is a PCI device > > on x86, I'd like to keep a qemu command line something like -device > > vfio,host=00:19.0. I assume that some of the iommu properties, such as > > dma window size/address, will be query-able through an architecture > > specific (or general if possible) ioctl on the vfio group fd. I hope > > that will help the specification, but I don't fully understand what all > > remains. Thanks, > > Well, for iommu there's a couple of different issues here but yes, > basically on one side we'll have some kind of ioctl to know what segment > of the device(s) DMA address space is assigned to the group and we'll > need to represent that to the guest via a device-tree property in some > kind of "parent" node of all the devices in that group. > > We -might- be able to implement some kind of hotplug of individual > devices of a group under such a PHB (PCI Host Bridge), I don't know for > sure yet, some of that PAPR stuff is pretty arcane, but basically, for > all intend and purpose, we really want a group to be represented as a > PHB in the guest. > > We cannot arbitrary have individual devices of separate groups be > represented in the guest as siblings on a single simulated PCI bus. I think the vfio kernel layer we're describing easily supports both. This is just a matter of adding qemu-vfio code to expose different topologies based on group iommu capabilities and mapping mode. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html