On Thu, Sep 2, 2010 at 9:49 AM, Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > On Thu, Sep 02, 2010 at 11:40:58AM +0300, Eduard - Gabriel Munteanu wrote: >> On Thu, Sep 02, 2010 at 08:28:26AM +0300, Michael S. Tsirkin wrote: >> > On Sat, Aug 28, 2010 at 05:54:53PM +0300, Eduard - Gabriel Munteanu wrote: >> > > PCI devices should access memory through pci_memory_*() instead of >> > > cpu_physical_memory_*(). This also provides support for translation and >> > > access checking in case an IOMMU is emulated. >> > > >> > > Memory maps are treated as remote IOTLBs (that is, translation caches >> > > belonging to the IOMMU-aware device itself). Clients (devices) must >> > > provide callbacks for map invalidation in case these maps are >> > > persistent beyond the current I/O context, e.g. AIO DMA transfers. >> > > >> > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@xxxxxxxxxxx> >> > >> > >> > I am concerned about adding more pointer chaising on data path. >> > Could we have >> > 1. an iommu pointer in a device, inherited by secondary buses >> > when they are created and by devices from buses when they are attached. >> > 2. translation pointer in the iommu instead of the bus >> >> The first solution I proposed was based on qdev, that is, each >> DeviceState had an 'iommu' field. Translation would be done by >> recursively looking in the parent bus/devs for an IOMMU. >> >> But Anthony said we're better off with bus-specific APIs, mostly because >> (IIRC) there may be different types of addresses and it might be >> difficult to abstract those properly. > > Well we ended up with casting > away types to make pci callbacks fit in ide structure, > and silently assuming that all addresses are in fact 64 bit. > So maybe it's hard to abstract addresses properly, but > it appears we'll have to, to avoid even worse problems. > >> I suppose I could revisit the idea by integrating the IOMMU in a >> PCIDevice as opposed to a DeviceState. >> >> Anthony, Paul, any thoughts on this? > > Just to clarify: this is an optimization idea: > instead of a bus walk on each access, do the walk > when device is attached to the bus, and copy the iommu > from the root to the device itself. > > This will also make it possible to create > DMADeviceState structure which would have this iommu field, > and we'd use this structure instead of the void pointers all over. > > >> > 3. pci_memory_XX functions inline, doing fast path for non-iommu case: >> > >> > if (__builtin_expect(!dev->iommu, 1) >> > return cpu_memory_rw >> >> But isn't this some sort of 'if (likely(!dev->iommu))' from the Linux >> kernel? If so, it puts the IOMMU-enabled case at disadvantage. > > IOMMU has a ton of indirections anyway. > >> I suppose most emulated systems would have at least some theoretical >> reasons to enable the IOMMU, e.g. as a GART replacement (say for 32-bit >> PCI devices) or for userspace drivers. >> So there are reasons to enable >> the IOMMU even when you don't have a real host IOMMU and you're not >> using nested guests. > > The time most people enable iommu for all devices in both real and virtualized > systems appears distant, one of the reasons is because it has a lot of overhead. > Let's start with not adding overhead for existing users, makes sense? I think the goal architecture (not for IOMMU, but in general) is one with zero copy DMA. This means we have stage one where the addresses are translated to host pointers and stage two where the read/write operation happens. The devices need to be converted. Now, let's consider the IOMMU in this zero copy architecture. It's one stage of address translation, for the access operation it will not matter. We can add translation caching at device level (or even at any intermediate levels), but that needs a cache invalidation callback system as discussed earlier. This can be implemented later, we need the zero copy stuff first. Given this overall picture, I think eliminating some pointer dereference overheads in non-zero copy architecture is a very premature optimization and it may even direct the architecture to wrong direction. If the performance degradation at this point is not acceptable, we could also postpone merging IOMMU until zero copy conversion has happened, or make IOMMU a compile time option. But it would be nice to back the decision by performance figures. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html