For streaming DMA mappings involving an IOMMU and whose IOVA len regularly exceeds the IOVA rcache upper limit (meaning that they are not cached), performance can be reduced. This is much more pronounced from commit 4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search fails"), as discussed at [0]. IOVAs which cannot be cached are highly involved in the IOVA aging issue, as discussed at [1]. This series attempts to allow the device driver hint what upper limit its DMA mapping IOVA lengths would be, so that the caching range may be increased. Some figures on storage scenario: v5.12-rc3 baseline: 600K IOPS With series: 1300K IOPS With reverting 4e89dce72521: 1250K IOPS All above are for IOMMU strict mode. Non-strict mode gives ~1750K IOPS in all scenarios. I will say that APIs and their semantics are a bit ropey - any better ideas welcome... [0] https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leizhen@xxxxxxxxxx/ [1] https://lore.kernel.org/linux-iommu/1607538189-237944-1-git-send-email-john.garry@xxxxxxxxxx/ John Garry (6): iommu: Move IOVA power-of-2 roundup into allocator iova: Add a per-domain count of reserved nodes iova: Allow rcache range upper limit to be configurable iommu: Add iommu_dma_set_opt_size() dma-mapping/iommu: Add dma_set_max_opt_size() scsi: hisi_sas: Set max optimal DMA size for v3 hw drivers/iommu/dma-iommu.c | 23 ++++--- drivers/iommu/iova.c | 88 ++++++++++++++++++++------ drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 2 + include/linux/dma-map-ops.h | 1 + include/linux/dma-mapping.h | 5 ++ include/linux/iova.h | 12 +++- kernel/dma/mapping.c | 11 ++++ 7 files changed, 115 insertions(+), 27 deletions(-) -- 2.26.2