Currently the IOMMU subsystem provides 3 basic operations: iommu_map(), iommu_map_sg() and iommu_unmap(). iommu_map() can be used to map memory page by page, however it involves flushing the caches (CPU and IOMMU) for every mapped page separately, which is unsuitable for use cases that require low mapping latency. Similarly iommu_unmap(), even though it takes a full IOVA range as its argument, performs unmapping in a page by page manner. To make mapping operation more suitable for such use cases, iommu_map_sg() and .map_sg() callback in iommu_ops struct were introduced, which allowed particular IOMMU drivers to directly iterate over SG entries, create necessary mappings and flush everything in one go. This approach, however, has two drawbacks: 1) it does not do anything about unmap performance, 2) it requires each driver willing to have fast map to implement its own SG iteration code, even though this is a mostly generic operation. This series tries to mitigate the two issues above, while acknowledging the fact that the .map_sg() callback might be still necessary for some specific platforms, which could have the need to iterate over SG elements inside driver code. Proposed solution introduces a new .flush() callback, which expects IOVA range as its argument and is expected to flush all respective caches (be it CPU, IOMMU TLB or whatever) to make the given IOVA area mapping change visible to IOMMU clients. Then all the 3 basic map/unmap operations are modified to call the .flush() callback at the end of the operation. Advantages of proposed approach include: 1) ability to use default_iommu_map_sg() helper if all the driver needs for performance optimization is batching the flush, 2) completely no effect on existing code - the .flush() callback is made optional and if it isn't implemented drivers are expected to do necessary flushes on a page by page basis in respective (un)mapping callbakcs, 3) possibility of exporting the iommu_flush() operation and providing unsynchronized map/unmap operations for subsystems with even higher requirements for performance (e.g. drivers/gpu/drm). The series includes a generic patch implementing necessary changes in IOMMU API and two Tegra-specific patches that demonstrate implementation on driver side and which can be used for further testing. Last, but not least, some performance numbers on Tegra210: +-----------+--------------+-------------+------------+ | Operation | Size [bytes] | Before [us] | After [us] | +-----------+--------------+-------------+------------+ | Map | 128K | 139 | 40 | | | | 136 | 34 | | | | 137 | 38 | | | | 136 | 36 | | | 4M | 3939 | 1163 | | | | 3730 | 2389 | | | | 3613 | 997 | | | | 3622 | 1620 | | | ~18M | 18635 | 4741 | | | | 19261 | 6550 | | | | 18473 | 9304 | | | | 18125 | 5120 | | Unmap | 128K | 128 | 7 | | | | 122 | 8 | | | | 119 | 10 | | | | 123 | 12 | | | 4M | 3829 | 151 | | | | 3964 | 150 | | | | 3908 | 145 | | | | 3875 | 155 | | | ~18M | 18570 | 683 | | | | 18473 | 806 | | | | 21020 | 643 | | | | 21764 | 652 | +-----------+--------------+-------------+------------+ The values are obtained by surrounding the calls to iommu_map_sg() (with default_iommu_map_sg() helper used as .map_sg() callback) and iommu_unmap() with ktime-based time measurement code. Taken 4 samples of every buffer size. ~18M means around 17-19M due do the variance in requested buffer sizes. Tomasz Figa (2): iommu: Add support for out of band flushing iommu/tegra-smmu: Make the driver use out of band flushing Vince Hsu (1): memory: tegra: add TLB cache line size drivers/iommu/iommu.c | 33 +++++++++++++-- drivers/iommu/tegra-smmu.c | 91 +++++++++++++++++++++++++++++++++++++---- drivers/memory/tegra/tegra114.c | 1 + drivers/memory/tegra/tegra124.c | 3 ++ drivers/memory/tegra/tegra210.c | 1 + drivers/memory/tegra/tegra30.c | 1 + include/linux/iommu.h | 2 + include/soc/tegra/mc.h | 1 + 8 files changed, 122 insertions(+), 11 deletions(-) -- 2.6.0.rc2.230.g3dd15c0 -- To unsubscribe from this list: send the line "unsubscribe linux-tegra" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html