On 2021/1/23 1:38, Robin Murphy wrote:
I kind of believe it's due to the indirect calls. This is also reported
on ARM.
https://lore.kernel.org/linux-iommu/1610376862-927-1-git-send-email-isaacm@xxxxxxxxxxxxxx/
Maybe we can try changing indirect calls to static ones to verify this
problem.
I liked the idea of map_sg() enough to try my hand at building a PoC for
Intel, based on Isaac's patch series. It's just a cut-and-paste of the
generic iommu.c code with the indirect calls to ops->map() replaced.
The indirect calls do not seem to be the problem. Calling intel_iommu_map
directly appears to be as costly as calling it indirectly.
However, perhaps there are other ways map_sg() can be beneficial. In
v5.10, __domain_mapping and iommu_flush_write_buffer() appear to be
invoked just once for each large map operation, for example.
Oh, if the driver needs to do maintenance beyond just installing PTEs,
that should probably be devolved to iotlb_sync_map anyway. There's a
patch series here generalising that to be more useful, which is
hopefully just waiting to be merged now:
https://lore.kernel.org/linux-iommu/20210107122909.16317-1-yong.wu@xxxxxxxxxxxx/
The iotlb_sync_map() could help here as far as I can see. I will post a
call-for-test patch set later.
Robin.
Best regards,
baolu