On Sat, Sep 04, 2010 at 09:01:06AM +0000, Blue Swirl wrote: > On Thu, Sep 2, 2010 at 9:49 AM, Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > > On Thu, Sep 02, 2010 at 11:40:58AM +0300, Eduard - Gabriel Munteanu wrote: > >> On Thu, Sep 02, 2010 at 08:28:26AM +0300, Michael S. Tsirkin wrote: > >> > On Sat, Aug 28, 2010 at 05:54:53PM +0300, Eduard - Gabriel Munteanu wrote: > >> > > PCI devices should access memory through pci_memory_*() instead of > >> > > cpu_physical_memory_*(). This also provides support for translation and > >> > > access checking in case an IOMMU is emulated. > >> > > > >> > > Memory maps are treated as remote IOTLBs (that is, translation caches > >> > > belonging to the IOMMU-aware device itself). Clients (devices) must > >> > > provide callbacks for map invalidation in case these maps are > >> > > persistent beyond the current I/O context, e.g. AIO DMA transfers. > >> > > > >> > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@xxxxxxxxxxx> > >> > > >> > > >> > I am concerned about adding more pointer chaising on data path. > >> > Could we have > >> > 1. an iommu pointer in a device, inherited by secondary buses > >> > when they are created and by devices from buses when they are attached. > >> > 2. translation pointer in the iommu instead of the bus > >> > >> The first solution I proposed was based on qdev, that is, each > >> DeviceState had an 'iommu' field. Translation would be done by > >> recursively looking in the parent bus/devs for an IOMMU. > >> > >> But Anthony said we're better off with bus-specific APIs, mostly because > >> (IIRC) there may be different types of addresses and it might be > >> difficult to abstract those properly. > > > > Well we ended up with casting > > away types to make pci callbacks fit in ide structure, > > and silently assuming that all addresses are in fact 64 bit. > > So maybe it's hard to abstract addresses properly, but > > it appears we'll have to, to avoid even worse problems. > > > >> I suppose I could revisit the idea by integrating the IOMMU in a > >> PCIDevice as opposed to a DeviceState. > >> > >> Anthony, Paul, any thoughts on this? > > > > Just to clarify: this is an optimization idea: > > instead of a bus walk on each access, do the walk > > when device is attached to the bus, and copy the iommu > > from the root to the device itself. > > > > This will also make it possible to create > > DMADeviceState structure which would have this iommu field, > > and we'd use this structure instead of the void pointers all over. > > > > > >> > 3. pci_memory_XX functions inline, doing fast path for non-iommu case: > >> > > >> > if (__builtin_expect(!dev->iommu, 1) > >> > return cpu_memory_rw > >> > >> But isn't this some sort of 'if (likely(!dev->iommu))' from the Linux > >> kernel? If so, it puts the IOMMU-enabled case at disadvantage. > > > > IOMMU has a ton of indirections anyway. > > > >> I suppose most emulated systems would have at least some theoretical > >> reasons to enable the IOMMU, e.g. as a GART replacement (say for 32-bit > >> PCI devices) or for userspace drivers. > >> So there are reasons to enable > >> the IOMMU even when you don't have a real host IOMMU and you're not > >> using nested guests. > > > > The time most people enable iommu for all devices in both real and virtualized > > systems appears distant, one of the reasons is because it has a lot of overhead. > > Let's start with not adding overhead for existing users, makes sense? > > I think the goal architecture (not for IOMMU, but in general) is one > with zero copy DMA. This means we have stage one where the addresses > are translated to host pointers and stage two where the read/write > operation happens. The devices need to be converted. > > Now, let's consider the IOMMU in this zero copy architecture. It's one > stage of address translation, for the access operation it will not > matter. We can add translation caching at device level (or even at any > intermediate levels), but that needs a cache invalidation callback > system as discussed earlier. This can be implemented later, we need > the zero copy stuff first. > > Given this overall picture, I think eliminating some pointer > dereference overheads in non-zero copy architecture is a very > premature optimization and it may even direct the architecture to > wrong direction. > > If the performance degradation at this point is not acceptable, we > could also postpone merging IOMMU until zero copy conversion has > happened, or make IOMMU a compile time option. But it would be nice to > back the decision by performance figures. I agree, a minimal benchmark showing no performance impact when disabled would put these concerns to rest. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html