On Tue, Sep 29, 2015 at 02:25:23PM +0900, Tomasz Figa wrote: > Currently the IOMMU subsystem provides 3 basic operations: iommu_map(), > iommu_map_sg() and iommu_unmap(). iommu_map() can be used to map memory > page by page, however it involves flushing the caches (CPU and IOMMU) for > every mapped page separately, which is unsuitable for use cases that > require low mapping latency. Similarly iommu_unmap(), even though it > takes a full IOVA range as its argument, performs unmapping in a page > by page manner. > > To make mapping operation more suitable for such use cases, iommu_map_sg() > and .map_sg() callback in iommu_ops struct were introduced, which allowed > particular IOMMU drivers to directly iterate over SG entries, create > necessary mappings and flush everything in one go. > > This approach, however, has two drawbacks: > 1) it does not do anything about unmap performance, > 2) it requires each driver willing to have fast map to implement its > own SG iteration code, even though this is a mostly generic operation. > > This series tries to mitigate the two issues above, while acknowledging > the fact that the .map_sg() callback might be still necessary for some > specific platforms, which could have the need to iterate over SG elements > inside driver code. Proposed solution introduces a new .flush() callback, > which expects IOVA range as its argument and is expected to flush all > respective caches (be it CPU, IOMMU TLB or whatever) to make the given > IOVA area mapping change visible to IOMMU clients. Then all the 3 basic > map/unmap operations are modified to call the .flush() callback at the end > of the operation. > > Advantages of proposed approach include: > 1) ability to use default_iommu_map_sg() helper if all the driver needs > for performance optimization is batching the flush, > 2) completely no effect on existing code - the .flush() callback is made > optional and if it isn't implemented drivers are expected to do > necessary flushes on a page by page basis in respective (un)mapping > callbakcs, > 3) possibility of exporting the iommu_flush() operation and providing > unsynchronized map/unmap operations for subsystems with even higher > requirements for performance (e.g. drivers/gpu/drm). That would require passing in some sort of flag that the core shouldn't be flushing itself, right? Currently it would flush on every map/unmap. > > The series includes a generic patch implementing necessary changes in > IOMMU API and two Tegra-specific patches that demonstrate implementation > on driver side and which can be used for further testing. > > Last, but not least, some performance numbers on Tegra210: > +-----------+--------------+-------------+------------+ > | Operation | Size [bytes] | Before [us] | After [us] | > +-----------+--------------+-------------+------------+ > | Map | 128K | 139 | 40 | > | | | 136 | 34 | > | | | 137 | 38 | > | | | 136 | 36 | > | | 4M | 3939 | 1163 | > | | | 3730 | 2389 | > | | | 3613 | 997 | > | | | 3622 | 1620 | > | | ~18M | 18635 | 4741 | > | | | 19261 | 6550 | > | | | 18473 | 9304 | > | | | 18125 | 5120 | > | Unmap | 128K | 128 | 7 | > | | | 122 | 8 | > | | | 119 | 10 | > | | | 123 | 12 | > | | 4M | 3829 | 151 | > | | | 3964 | 150 | > | | | 3908 | 145 | > | | | 3875 | 155 | > | | ~18M | 18570 | 683 | > | | | 18473 | 806 | > | | | 21020 | 643 | > | | | 21764 | 652 | > +-----------+--------------+-------------+------------+ > The values are obtained by surrounding the calls to iommu_map_sg() > (with default_iommu_map_sg() helper used as .map_sg() callback) and > iommu_unmap() with ktime-based time measurement code. Taken 4 samples > of every buffer size. ~18M means around 17-19M due do the variance > in requested buffer sizes. Those are pretty impressive numbers. Thierry
Attachment:
signature.asc
Description: PGP signature