On Wed, 2022-09-28 at 10:32 -0300, Jason Gunthorpe wrote: > On Wed, Sep 28, 2022 at 10:58:22AM +0200, Niklas Schnelle wrote: > > On Tue, 2022-09-27 at 13:56 -0300, Jason Gunthorpe wrote: > > > On Tue, Sep 27, 2022 at 06:33:48PM +0200, Niklas Schnelle wrote: > > > > > > > Not sure what the non-MSI reservation is for? It does seem like x86_64 > > > > also uses this for quite large ranges. > > > > > > There are lots of things that are unsuitable for DMA on x86 platforms, > > > unfortunately.. But yeah, I'm not sure either. > > > > > > > This is because I'm getting a map request for an IOVA in the reserved > > > > region. > > > > > > How come? iova_reserve_iommu_regions() reads the reserved regions and > > > loads them as reserved into the iovad which should cause > > > iommu_dma_alloc_iova() and alloc_iova_fast() to not return values in > > > those ranges. > > > > > > It all looks like it is supposed to work > > > > > > Did something go wrong in the initialization order perhaps? > > > > > > Jason > > > > It was of course a classic off-by-one, the table size is a number of > > entries but geometry.aperture_end seems to be the largest allowed IOVA. > > So we need: > > Right, I dislike this naming usually 'end' means "start + length" and > 'last' means "start + length - 1" > > > Otherwise the first IOVA allocated is ZPCI_TABLE_SIZE_RT itself. > > Similarly we need the second reserved region if (zdev->end_dma < > > ZPCI_TABLE_SIZE_RT - 1). In your patch I think you had the > > MAX_DMA_TABLE_ADDR name right but would have also calculated the number > > of entries. > > Make sense.. > > > On the other hand with the dma-iommu.c conversion it no longer makes > > sense to lower zdev->end_dma artificially, so at least on current > > machine LPARs we would end up with just a lower reserved region > > 0x0000000000000000 to 0x00000000ffffffff and can use IOVAs up to > > aperture_end. > > So zdev->end_dma == MAX_DMA_TABLE_ADDR? > > (and is zdev->end_dma and 'end' or 'last' ?) Basically yes though at least on LPARs the firmware interface that gives us the initial zdev->end returns an even higher value but we clamp it down to the aperture. It is "start + length - 1". > > Can you include this patch once you are happy with it, it nicely > tidies this series? > > Thanks, > Jason Yes will do. In the meantime I'm now close to sending an RFC version of the conversion to dma-iommu. So my plan is to send out 3 series of patches. 1. v3 of this series of IOMMU fixes including your suggestion to use reserved ranges, the previously mentioned off-by-one fix and another IOMMU issue I found (pgsize_bitmap is wrong). 2. A series of improvements to the s390 IOMMU code including implementing map_pages() and lock-free page table updates 3. A series converting s390 to use dma-iommu plus changes against dma- iommu.c common code to implement an alternative flushing scheme that brings z/VM and KVM guest PCI performance back to the level of our existing DMA API implementation.