On Mon, 2023-02-20 at 16:22 +0100, Niklas Schnelle wrote: > Hi All, > > This patch series converts s390's PCI support from its platform specific DMA > API implementation in arch/s390/pci/pci_dma.c to the common DMA IOMMU layer. > The conversion itself is done in patches 3-4 with patch 2 providing the final > necessary IOMMU driver improvement to handle s390's special IOTLB flush > out-of-resource indication in virtualized environments. Patches 1-2 may be > applied independently. The conversion itself only touches the s390 IOMMU driver > and s390 arch code moving over remaining functions from the s390 DMA API > implementation. No changes to common code are necessary. > > After patch 4 the basic conversion is done and on our partitioning machine > hypervisor LPAR performance matches or exceeds the existing code. When running > under z/VM or KVM however, performance plummets to about half of the existing > code due to a much higher rate of IOTLB flushes for unmapped pages. Due to the > hypervisors use of IOTLB flushes to synchronize their shadow tables these are > very expensive and minimizing them is key for regaining the performance loss. > > To this end patches 5-6 propose a new, single queue, IOTLB flushing scheme as > an alternative to the existing per-CPU flush queues. Introducing an alternative > scheme was also suggested by Robin Murphy[1]. In the previous RFC of this > conversion Robin suggested reusing more of the existing queuing logic which > I incorporated since v2. The single queue mode is introduced in patch > 5 together with a new dma_iommu_options struct and tune_dma_iommu callback in > IOMMU ops which allows IOMMU drivers to switch to a single flush queue. > > Then patch 6 enables variable queue sizes using power of 2 queue sizes and > shift/mask to keep performance as close to the existing code as possible. The > variable queue size and a variable timeout are added to the dma_iommu_options > struct and utilized by s390 in the z/VM and KVM guest cases. As it is > implemented in common code the single queue IOTLB flushing scheme can of course > be used by other platforms with expensive IOTLB flushes. Particularly > virtio-iommu may be a candidate. > > In a previous version I verified that the new scheme does work on my x86_64 > Ryzen workstation by locally modifying iommu_subsys_init() to default to the > single queue mode and verifying its use via "/sys/.../iommu_group/type". I did > not find problems with an AMD GPU, Intel NIC (with SR-IOV and KVM > pass-through), NVMes or any on board peripherals. > > As with previous series this is available via my git.kernel.org tree[3] in the > dma_iommu_v7 branch with signed s390_dma_iommu_v7 tag. This version applies > on top of iommu-next to incorporate the ops->set_platform_dma() and GFP > changes. FYI this patch set now applies cleanly (and works) on v6.3-rc1. If need be I can resend with Matt's R-b added but other than that I currently don't have open TODOs for this so review away. Thanks, Niklas