On Fri, May 20, 2022 at 06:26:03PM +0200, Niklas Schnelle wrote: > > So, from what I can tell, the S390 HW is not really the same as a > > normal iommu in that you can do map over IOVA that hasn't been flushed > > yet and the map will restore coherency to the new page table > > entries. I see the zpci_refresh_trans() call in map which is why I > > assume this? > > The zpci_refresh_trans() in map is only there because previously we > didn't implement iotlb_sync_map(). Also, we only need to flush on map > for the paged guest case so the hypervisor can update its shadow table. > It happens unconditionally in the existing s390_iommu.c because that > was not well optimized and uses the same s390_iommu_update_trans() for > map and unmap. We had the skipping of the TLB flush handled properly in > the arch/s390/pci_dma.c mapping code where !zdev->tlb_refresh indicates > that we don't need flushes on map. Even the arch/s390/pci_dma.c has a zpci_refresh_trans() on map, it is just conditional on zdev->tlb_refresh I had also assumed that the paging case uses this path? > > (note that normal HW has a HW IOTLB cache that MUST be flushed or new > > maps will not be loaded by the HW, so mapping to areas that previously > > had uninvalidated IOVA is a functional problem, which motivates the > > design of this scheme) > > We do need to flush the TLBs on unmap. The reason is that under LPAR > (non paging hypervisor) the hardware can establish a new mapping on its > own if an I/O PTE is changed from invalid to a valid translation and it > wasn't previously in the TLB. I think that's how most hardware IOMMUs > work and how I understand your explanation too. Since you said LPAR was OK performance wise, I was thinking only about the paging case. You can have different iommu_domains implementations for these two cases. But if the paging case doesn't need the hypercall anyhow it doesn't work. You'd have to change the core code to increase the timer duration, I think. Jason