Re: [PATCH 2/7] pci: memory access API and IOMMU support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Sep 04, 2010 at 09:01:06AM +0000, Blue Swirl wrote:
> On Thu, Sep 2, 2010 at 9:49 AM, Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
> > On Thu, Sep 02, 2010 at 11:40:58AM +0300, Eduard - Gabriel Munteanu wrote:
> >> On Thu, Sep 02, 2010 at 08:28:26AM +0300, Michael S. Tsirkin wrote:
> >> > On Sat, Aug 28, 2010 at 05:54:53PM +0300, Eduard - Gabriel Munteanu wrote:
> >> > > PCI devices should access memory through pci_memory_*() instead of
> >> > > cpu_physical_memory_*(). This also provides support for translation and
> >> > > access checking in case an IOMMU is emulated.
> >> > >
> >> > > Memory maps are treated as remote IOTLBs (that is, translation caches
> >> > > belonging to the IOMMU-aware device itself). Clients (devices) must
> >> > > provide callbacks for map invalidation in case these maps are
> >> > > persistent beyond the current I/O context, e.g. AIO DMA transfers.
> >> > >
> >> > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@xxxxxxxxxxx>
> >> >
> >> >
> >> > I am concerned about adding more pointer chaising on data path.
> >> > Could we have
> >> > 1. an iommu pointer in a device, inherited by secondary buses
> >> >    when they are created and by devices from buses when they are attached.
> >> > 2. translation pointer in the iommu instead of the bus
> >>
> >> The first solution I proposed was based on qdev, that is, each
> >> DeviceState had an 'iommu' field. Translation would be done by
> >> recursively looking in the parent bus/devs for an IOMMU.
> >>
> >> But Anthony said we're better off with bus-specific APIs, mostly because
> >> (IIRC) there may be different types of addresses and it might be
> >> difficult to abstract those properly.
> >
> > Well we ended up with casting
> > away types to make pci callbacks fit in ide structure,
> > and silently assuming that all addresses are in fact 64 bit.
> > So maybe it's hard to abstract addresses properly, but
> > it appears we'll have to, to avoid even worse problems.
> >
> >> I suppose I could revisit the idea by integrating the IOMMU in a
> >> PCIDevice as opposed to a DeviceState.
> >>
> >> Anthony, Paul, any thoughts on this?
> >
> > Just to clarify: this is an optimization idea:
> > instead of a bus walk on each access, do the walk
> > when device is attached to the bus, and copy the iommu
> > from the root to the device itself.
> >
> > This will also make it possible to create
> > DMADeviceState structure which would have this iommu field,
> > and we'd use this structure instead of the void pointers all over.
> >
> >
> >> > 3. pci_memory_XX functions inline, doing fast path for non-iommu case:
> >> >
> >> >     if (__builtin_expect(!dev->iommu, 1)
> >> >             return cpu_memory_rw
> >>
> >> But isn't this some sort of 'if (likely(!dev->iommu))' from the Linux
> >> kernel? If so, it puts the IOMMU-enabled case at disadvantage.
> >
> > IOMMU has a ton of indirections anyway.
> >
> >> I suppose most emulated systems would have at least some theoretical
> >> reasons to enable the IOMMU, e.g. as a GART replacement (say for 32-bit
> >> PCI devices) or for userspace drivers.
> >> So there are reasons to enable
> >> the IOMMU even when you don't have a real host IOMMU and you're not
> >> using nested guests.
> >
> > The time most people enable iommu for all devices in both real and virtualized
> > systems appears distant, one of the reasons is because it has a lot of overhead.
> > Let's start with not adding overhead for existing users, makes sense?
> 
> I think the goal architecture (not for IOMMU, but in general) is one
> with zero copy DMA. This means we have stage one where the addresses
> are translated to host pointers and stage two where the read/write
> operation happens. The devices need to be converted.
> 
> Now, let's consider the IOMMU in this zero copy architecture. It's one
> stage of address translation, for the access operation it will not
> matter. We can add translation caching at device level (or even at any
> intermediate levels), but that needs a cache invalidation callback
> system as discussed earlier. This can be implemented later, we need
> the zero copy stuff first.
> 
> Given this overall picture, I think eliminating some pointer
> dereference overheads in non-zero copy architecture is a very
> premature optimization and it may even direct the architecture to
> wrong direction.
> 
> If the performance degradation at this point is not acceptable, we
> could also postpone merging IOMMU until zero copy conversion has
> happened, or make IOMMU a compile time option. But it would be nice to
> back the decision by performance figures.

I agree, a minimal benchmark showing no performance impact
when disabled would put these concerns to rest.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux