Re: [PATCH v12 0/6] iommu/dma: s390 DMA API conversion and optimized IOTLB flushing

Niklas Schnelle <schnelle@xxxxxxxxxxxxx> · Wed, 27 Sep 2023 17:24:20 +0200

On Wed, 2023-09-27 at 16:31 +0200, Niklas Schnelle wrote:
> On Wed, 2023-09-27 at 15:20 +0200, Niklas Schnelle wrote:
> > On Wed, 2023-09-27 at 13:24 +0200, Niklas Schnelle wrote:
> > > On Wed, 2023-09-27 at 11:55 +0200, Joerg Roedel wrote:
> > > > Hi Niklas,
> > > > 
> > > > On Wed, Sep 27, 2023 at 10:55:23AM +0200, Niklas Schnelle wrote:
> > > > > The problem is that something seems to  be broken in the iommu/core
> > > > > branch. Regardless of whether I have my DMA API conversion on top or
> > > > > with the base iommu/core branch I can not use ConnectX-4 VFs.
> > > > 
> > > > Have you already tried to bisect the issue in the iommu/core branch?
> > > > The result might sched some light on the issue.
> > > > 
> > > > Regards,
> > > > 
> > > > 	Joerg
> > > 
> > > Hi Joerg,
> > > 
> > > Working on it, somehow I must have messed up earlier. It now looks like
> > > it might in fact be caused by my DMA API conversion rebase and the
> > > "s390/pci: Use dma-iommu layer" commit. Maybe there is some interaction
> > > with Jason's patches that I haven't thought about. So sorry for any
> > > wrong blame.
> > > 
> > > Thanks,
> > > Niklas
> > 
> > Hi,
> > 
> > I tracked the problem down from mlx5_core's alloc_cmd_page() via
> > dma_alloc_coherent(), ops->alloc, iommu_dma_alloc_remap(), and
> > __iommu_dma_alloc_noncontiguous() to a failed iommu_dma_alloc_iova().
> > The allocation here is for 4K so nothing crazy.
> > 
> > On second look I also noticed:
> > 
> > nvme 2007:00:00.0: Using 42-bit DMA addresses
> > 
> > for the NVMe that is working. The problem here seems to be that we set
> > iommu_dma_forcedac = true in s390_iommu_probe_finalize() because we
> > have currently have a reserved region over the first 4 GiB anyway so
> > will always use IOVAs larger than that. That however is too late since
> > iommu_dma_set_pci_32bit_workaround() is already checked in
> > __iommu_probe_device() which is called just before ops-
> > > probe_finalize(). So I moved setting iommu_dma_forcedac = true to
> > zpci_init_iommu() and that gets rid of the notice for the NVMe but I
> > still get a failure of iommu_dma_alloc_iova() in
> > __iommu_dma_alloc_noncontiguous(). So I'll keep digging.
> > 
> > Thanks,
> > Niklas
> 
> 
> Ok I think I got it and this doesn't seem strictly s390x specific but
> I'd think should happen with iommu.forcedac=1 everywhere.
> 
> The reason iommu_dma_alloc_iova() fails seems to be that mlx5_core does
> dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64)) in 
> mlx5_pci_init()->set_dma_caps() which happens after it already called
> mlx5_mdev_init()->mlx5_cmd_init()->alloc_cmd_page() so for the
> dma_alloc_coherent() in there the dev->coherent_dma_mask is still
> DMA_BIT_MASK(32) for which we can't find an IOVA because well we don't
> have IOVAs below 4 GiB. Not entirely sure what caused this not to be
> enforced before.
> 
> Thanks,
> Niklas
> 

Ok, another update. On trying it out again this problem actually also
occurs when applying this v12 on top of v6.6-rc3 too. Also I guess
unlike my prior thinking it probably doesn't occur with
iommu.forcedac=1 since that still allows IOVAs below 4 GiB and we might
be the only ones who don't support those. From my point of view this
sounds like a mlx5_core issue they really should call
dma_set_mask_and_coherent() before their first call to
dma_alloc_coherent() not after. So I guess I'll send a v13 of this
series rebased on iommu/core and with an additional mlx5 patch and then
let's hope we can get that merged in a way that doesn't leave us with
broken ConnectX VFs for too long.

Thanks,
Niklas