On Tue, May 24, 2022 at 2:06 PM William McVicker <willmcvicker@xxxxxxxxxx> wrote: > > Hi All, > > I've been debugging a PCIe dma mapping issue and I believe I have tracked the > bug down to how the designware PCIe host driver is mapping the MSI msg. In > commit 07940c369a6b ("PCI: dwc: Fix MSI page leakage in suspend/resume") [1], > the PCIe driver was re-worked to drop allocating a page for the MSI msg in > favor of using an address from the driver data. Then in commit 660c486590aa > ("PCI: dwc: Set 32-bit DMA mask for MSI target address allocation") [2], > a 32-bit DMA mask was enforced for this MSI msg address in order to support > both 32-bit and 64-bit MSI address capable hardware. Both of these changes > together expose a bug on hardware that supports an MSI address greather than > 32-bits. For example, the Pixel 6 supports a 36-bit MSI address and therefore > calls: > > dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(36)); > > Before [2], this was fine because getting an address for the driver data that > was less than or equal to 36-bits was common enough to not hit this issue, but > after [2] I started hitting the below DMA buffer overflow when the driver data > address was greater than 32-bits: > > exynos-pcie-rc 14520000.pcie: DMA addr 0x000000088536d908+2 overflow (mask ffffffff, bus limit 0). > : WARNING: CPU: 3 PID: 8 at kernel/dma/direct.h:99 dma_map_page_attrs+0x254/0x278 > ... > Hardware name: Oriole DVT (DT) > Workqueue: events_unbound deferred_probe_work_func > pstate : 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : dma_map_page_attrs+0x254/0x278 > lr : dma_map_page_attrs+0x250/0x278 > sp : ffffffc0080938b0 > ... > Call trace: > : dma_map_page_attrs+0x254/0x278 > : dma_map_single_attrs+0xdc/0x10c > : dw_pcie_host_init+0x4a0/0x78c > : exynos_pcie_rc_add_port+0x7c/0x104 [pcie_exynos_gs] > : exynos_pcie_rc_probe+0x4c8/0x6ec [pcie_exynos_gs] > : platform_probe+0x80/0x200 > : really_probe+0x1cc/0x458 > : __driver_probe_device+0x204/0x260 > : driver_probe_device+0x44/0x4b0 > : __device_attach_driver+0x200/0x308 > : __device_attach+0x20c/0x330 > > > The underlying issue is that using the driver data (which can be a 64-bit > address) for the MSI msg mapping causes a DMA_MAPPING_ERROR when the dma mask > is less than 64-bits. I'm not familiar enough with the dma mapping code to > suggest a full-proof solution to solve this; however, I don't think reverting > [1] is a great solution since it addresses a valid issue and reverting [2] > doesn't actually solve the bug since the driver data address isn't restricted > by the dma mask. > > I hope that helps explain the issue. Please let me know your thoughts on how we > should address this. I think the alloc for the msi_msg just needs a GFP_DMA32 flag. Unfortunately that is done in each driver and would be kind of odd. The thing is I'm pretty sure the actual address doesn't matter. The MSI never actually writes to memory but is terminated by the MSI controller. It just can't be an address you would want to DMA to (such as driver data allocations). And it needs to account for any bus translations, which the DMA API conveniently handles. So maybe it needs to be its own alloc as before but avoiding the leak and also setting GFP_DMA32. Unless others have ideas. Rob