Re: [Qemu-devel] Re: [RFC PATCH 4/7] ide: IOMMU support

Paul Brook <paul@xxxxxxxxxxxxxxxx> · Thu, 15 Jul 2010 15:02:43 +0100

> >>> Depending how the we decide to handle IOMMU invalidation, it may also
> >>> be necessary to augment the memory_map API to allow the system to
> >>> request a mapping be revoked.  However this issue is not specific to
> >>> the IOMMU implementation. Such bugs are already present on any system
> >>> that allows dynamic reconfiguration of the address space, e.g. by
> >>> changing PCI BARs.
> >> 
> >> That's why the memory_map API today does not allow mappings to persist
> >> after trips back to the main loop.
> > 
> > Sure it does.  If you can't combine zero-copy memory access with
> > asynchronous IO then IMO it's fairly useless. See e.g. dma-helpers.c
> 
> DMA's a very special case.  

Special compared to what?  The whole purpose of this API is to provide DMA.

> DMA is performed asynchronously to the
> execution of the CPU so you generally can't make any guarantees about
> what state the transaction is in until it's completed.  That gives us a
> fair bit of wiggle room when dealing with a DMA operation to a region of
> physical memory where the physical memory mapping is altered in some way
> during the transaction.

You do have ordering constraints though. While it may not be possible to 
directly determine whether the DMA completed before or after the remapping, 
and you might not be able to make any assumptions about the atomicity of the 
transaction as a whole, it is reasonable to assume that any writes to the old 
mapping will occur before the remapping operation completes.

While things like store buffers potentially allows reordering and deferral of 
accesses, there are generally fairly tight constraints on this. For example a 
PCI hast bridge may buffer CPU writes. However it will guarantee that those 
writes have been flushed out before a subsequent read operation completes.

Consider the case where the hypervisor allows passthough of a device, using 
the IOMMU to support DMA from that device into virtual machine RAM. When that 
virtual machine is destroyed the IOMMU mapping for that device will be 
invalidated. Once the invalidation has completed that RAM can be reused by the 
hypervisor for other purposes. This may happen before the device is reset.  We 
probably don't really care what happens to the device in this case, but we do 
need to prevent the device stomping on ram it no longer owns.

There are two ways this can be handled:

If your address translation mechanism allows updates to be deferred 
indefinitely then we can stall until all relevant DMA transactions have 
completed.  This is probably sufficient for well behaved guests, but 
potentially opens up a significant window for DoS attacks. 

If you need the remapping to occur in a finite timeframe (in the PCI BAR case 
this is probably before the next CPU access to that bus) then you need some 
mechanism for revoking the host mapping provided by cpu_physical_memory_map.

Note that a QEMU DMA transaction typically encompasses a whole block of data. 
The transaction is started when the AIO request is issued, and remains live 
until the transfer completes. This includes the time taken to fetch the data 
from external media/devices.

On real hardware a DMA transaction typically only covers a single burst memory 
write (maybe 16 bytes). This will generally not start until the device has 
buffered sufficient data to satisfy the burst (or has sufficient buffer space 
to receive the whole burst).

Paul
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html