> > For us the most simple and logical approach (which is also what pHyp > > uses and what Linux handles well) is really to expose a given PCI host > > bridge per group to the guest. Believe it or not, it makes things > > easier :-) > > I'm all for easier. Why does exposing the bridge use less bus numbers > than emulating a bridge? Because a host bridge doesn't look like a PCI to PCI bridge at all for us. It's an entire separate domain with it's own bus number space (unlike most x86 setups). In fact we have some problems afaik in qemu today with the concept of PCI domains, for example, I think qemu has assumptions about a single shared IO space domain which isn't true for us (each PCI host bridge provides a distinct IO space domain starting at 0). We'll have to fix that, but it's not a huge deal. So for each "group" we'd expose in the guest an entire separate PCI domain space with its own IO, MMIO etc... spaces, handed off from a single device-tree "host bridge" which doesn't itself appear in the config space, doesn't need any emulation of any config space etc... > On x86, I want to maintain that our default assignment is at the device > level. A user should be able to pick single or multiple devices from > across several groups and have them all show up as individual, > hotpluggable devices on bus 0 in the guest. Not surprisingly, we've > also seen cases where users try to attach a bridge to the guest, > assuming they'll get all the devices below the bridge, so I'd be in > favor of making this "just work" if possible too, though we may have to > prevent hotplug of those. > > Given the device requirement on x86 and since everything is a PCI device > on x86, I'd like to keep a qemu command line something like -device > vfio,host=00:19.0. I assume that some of the iommu properties, such as > dma window size/address, will be query-able through an architecture > specific (or general if possible) ioctl on the vfio group fd. I hope > that will help the specification, but I don't fully understand what all > remains. Thanks, Well, for iommu there's a couple of different issues here but yes, basically on one side we'll have some kind of ioctl to know what segment of the device(s) DMA address space is assigned to the group and we'll need to represent that to the guest via a device-tree property in some kind of "parent" node of all the devices in that group. We -might- be able to implement some kind of hotplug of individual devices of a group under such a PHB (PCI Host Bridge), I don't know for sure yet, some of that PAPR stuff is pretty arcane, but basically, for all intend and purpose, we really want a group to be represented as a PHB in the guest. We cannot arbitrary have individual devices of separate groups be represented in the guest as siblings on a single simulated PCI bus. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html