On Wed, 2011-08-24 at 10:43 +0200, Joerg Roedel wrote: > On Tue, Aug 23, 2011 at 03:30:06PM -0400, Alex Williamson wrote: > > On Tue, 2011-08-23 at 07:01 +1000, Benjamin Herrenschmidt wrote: > > > > Could be tho in what form ? returning sysfs pathes ? > > > > I'm at a loss there, please suggest. I think we need an ioctl that > > returns some kind of array of devices within the group and another that > > maybe takes an index from that array and returns an fd for that device. > > A sysfs path string might be a reasonable array element, but it sounds > > like a pain to work with. > > Limiting to PCI we can just pass the BDF as the argument to optain the > device-fd. For a more generic solution we need a unique identifier in > some way which is unique across all 'struct device' instances in the > system. As far as I know we don't have that yet (besides the sysfs-path) > so we either add that or stick with bus-specific solutions. > > > > 1:1 process has the advantage of linking to an -mm which makes the whole > > > mmu notifier business doable. How do you want to track down mappings and > > > do the second level translation in the case of explicit map/unmap (like > > > on power) if you are not tied to an mm_struct ? > > > > Right, I threw away the mmu notifier code that was originally part of > > vfio because we can't do anything useful with it yet on x86. I > > definitely don't want to prevent it where it makes sense though. Maybe > > we just record current->mm on open and restrict subsequent opens to the > > same. > > Hmm, I think we need io-page-fault support in the iommu-api then. Yeah, when we can handle iommu page faults, this gets more interesting. > > > Another aspect I don't see discussed is how we represent these things to > > > the guest. > > > > > > On Power for example, I have a requirement that a given iommu domain is > > > represented by a single dma window property in the device-tree. What > > > that means is that that property needs to be either in the node of the > > > device itself if there's only one device in the group or in a parent > > > node (ie a bridge or host bridge) if there are multiple devices. > > > > > > Now I do -not- want to go down the path of simulating P2P bridges, > > > besides we'll quickly run out of bus numbers if we go there. > > > > > > For us the most simple and logical approach (which is also what pHyp > > > uses and what Linux handles well) is really to expose a given PCI host > > > bridge per group to the guest. Believe it or not, it makes things > > > easier :-) > > > > I'm all for easier. Why does exposing the bridge use less bus numbers > > than emulating a bridge? > > > > On x86, I want to maintain that our default assignment is at the device > > level. A user should be able to pick single or multiple devices from > > across several groups and have them all show up as individual, > > hotpluggable devices on bus 0 in the guest. Not surprisingly, we've > > also seen cases where users try to attach a bridge to the guest, > > assuming they'll get all the devices below the bridge, so I'd be in > > favor of making this "just work" if possible too, though we may have to > > prevent hotplug of those. > > A side-note: Might it be better to expose assigned devices in a guest on > a seperate bus? This will make it easier to emulate an IOMMU for the > guest inside qemu. I think we want that option, sure. A lot of guests aren't going to support hotplugging buses though, so I think our default, map the entire guest model should still be using bus 0. The ACPI gets a lot more complicated for that model too; dynamic SSDTs? Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html