Re: [PATCH v7 0/6] iommu/dma: s390 DMA API conversion and optimized IOTLB flushing

Niklas Schnelle <schnelle@xxxxxxxxxxxxx> · Tue, 07 Mar 2023 14:16:37 +0100

On Mon, 2023-02-20 at 16:22 +0100, Niklas Schnelle wrote:
> Hi All,
> 
> This patch series converts s390's PCI support from its platform specific DMA
> API implementation in arch/s390/pci/pci_dma.c to the common DMA IOMMU layer.
> The conversion itself is done in patches 3-4 with patch 2 providing the final
> necessary IOMMU driver improvement to handle s390's special IOTLB flush
> out-of-resource indication in virtualized environments. Patches 1-2 may be
> applied independently. The conversion itself only touches the s390 IOMMU driver
> and s390 arch code moving over remaining functions from the s390 DMA API
> implementation. No changes to common code are necessary.
> 
> After patch 4 the basic conversion is done and on our partitioning machine
> hypervisor LPAR performance matches or exceeds the existing code. When running
> under z/VM or KVM however, performance plummets to about half of the existing
> code due to a much higher rate of IOTLB flushes for unmapped pages. Due to the
> hypervisors use of IOTLB flushes to synchronize their shadow tables these are
> very expensive and minimizing them is key for regaining the performance loss.
> 
> To this end patches 5-6 propose a new, single queue, IOTLB flushing scheme as
> an alternative to the existing per-CPU flush queues. Introducing an alternative
> scheme was also suggested by Robin Murphy[1]. In the previous RFC of this
> conversion Robin suggested reusing more of the existing queuing logic which
> I incorporated since v2. The single queue mode is introduced in patch
> 5 together with a new dma_iommu_options struct and tune_dma_iommu callback in
> IOMMU ops which allows IOMMU drivers to switch to a single flush queue.
> 
> Then patch 6 enables variable queue sizes using power of 2 queue sizes and
> shift/mask to keep performance as close to the existing code as possible. The
> variable queue size and a variable timeout are added to the dma_iommu_options
> struct and utilized by s390 in the z/VM and KVM guest cases. As it is
> implemented in common code the single queue IOTLB flushing scheme can of course
> be used by other platforms with expensive IOTLB flushes. Particularly
> virtio-iommu may be a candidate.
> 
> In a previous version I verified that the new scheme does work on my x86_64
> Ryzen workstation by locally modifying iommu_subsys_init() to default to the
> single queue mode and verifying its use via "/sys/.../iommu_group/type". I did
> not find problems with an AMD GPU, Intel NIC (with SR-IOV and KVM
> pass-through), NVMes or any on board peripherals.
> 
> As with previous series this is available via my git.kernel.org tree[3] in the
> dma_iommu_v7 branch with signed s390_dma_iommu_v7 tag. This version applies
> on top of iommu-next to incorporate the ops->set_platform_dma() and GFP
> changes.

FYI this patch set now applies cleanly (and works) on v6.3-rc1. If need
be I can resend with Matt's R-b added but other than that I currently
don't have open TODOs for this so review away.

Thanks,
Niklas