On 27/09/2023 4:40 pm, Jason Gunthorpe wrote:
On Wed, Sep 27, 2023 at 05:24:20PM +0200, Niklas Schnelle wrote:
Ok, another update. On trying it out again this problem actually also
occurs when applying this v12 on top of v6.6-rc3 too. Also I guess
unlike my prior thinking it probably doesn't occur with
iommu.forcedac=1 since that still allows IOVAs below 4 GiB and we might
be the only ones who don't support those. From my point of view this
sounds like a mlx5_core issue they really should call
dma_set_mask_and_coherent() before their first call to
dma_alloc_coherent() not after. So I guess I'll send a v13 of this
series rebased on iommu/core and with an additional mlx5 patch and then
let's hope we can get that merged in a way that doesn't leave us with
broken ConnectX VFs for too long.
Yes, OK. It definitely sounds wrong that mlx5 is doing dma allocations before
setting it's dma_set_mask_and_coherent(). Please link to this thread
and we can get Leon or Saeed to ack it for Joerg.
(though wondering why s390 is the only case that ever hit this?)
Probably because most systems happen to be able to satisfy the
allocation within the default 32-bit mask - the whole bottom 4GB of IOVA
space being reserved is pretty atypical.
TBH it makes me wonder the opposite - how this ever worked on s390
before? And I think the answer to that is "by pure chance", since upon
inspection the existing s390_pci_dma_ops implementation appears to pay
absolutely no attention to the device's DMA masks whatsoever :(
Robin.