On Thu, Aug 17, 2023 at 10:31:31AM +0200, Marek Szyprowski wrote: > Hi Jason, > > On 09.08.2023 16:43, Jason Gunthorpe wrote: > > I missed two paths where __iommu_probe_device() can be called while > > already holding the device_lock() for the device that is to be probed. > > > > This causes a deadlock because __iommu_probe_device() will attempt to > > re-acquire the lock. > > > > Organize things so that these two paths can re-use the existing already > > held device_lock by marking the call chains based on if the lock is held > > or not. > > > > Also the order of iommu_init_device() was not correct for > > generic_single_device_group() > > I've just noticed that there is still one more issue left to fix after > applying this patchset. On Qualcomm's RB5 board I get the following > warning (captured on vanilla next-20230817): > Call trace: > iommu_probe_device_locked+0xd4/0xe4 > of_iommu_configure+0x10c/0x200 > of_dma_configure_id+0x104/0x3b8 So it open codes a call to of_dma_configure_id for some reason This is a bizzaro thing to do, it is taking the OF handle from a dt: node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0); And the sort of co-opts the platform device it is associated with: struct platform_device *pdev = of_find_device_by_node(node); gmu->dev = &pdev->dev; of_dma_configure(gmu->dev, node, true); And hackily sets up the iommu? It needs to do this because the iommu doesn't get setup until something probes on the device. But this code is just stealing the platform device, it doesn't actually probe a driver. So this is all kinds of wrong. Attach a driver if you intend to claim the device and use it :( Anyhow, you can "fix" it with this: diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c index 5deb79924897af..8dbd75c200b91c 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c @@ -1556,7 +1556,9 @@ int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node) gmu->dev = &pdev->dev; + device_lock(gmu->dev); of_dma_configure(gmu->dev, node, true); + device_unlock(gmu->dev); /* Fow now, don't do anything fancy until we get our feet under us */ gmu->idle_level = GMU_IDLE_STATE_ACTIVE; But there is much more broken here than just that. Maybe leaving the warning is a good thing so people can fix this properly. (eg remove the call to of_dma_configure and use the driver core correctly) Though looking around I see other places open coding of_dma_configure, so I suppose this will turn into a persisting issue. Some of these are not going to throw a warning because they are adjusting the same device the driver is probing for. However those are busted up since they try to attach the iommu driver to a device after the driver core has passed iommu_device_use_default_domain(). It will explode if the driver is removed. Bascially.. Yikes! drivers/bcma/main.c: of_dma_configure(&core->dev, node, false); No idea.. drivers/dma/qcom/hidma_mgmt.c: of_dma_configure(&new_pdev->dev, child, true); Registers a platformdevice then does of_dma_configure(), seems wrong. of_dma_configure() should be done when a driver is probed. drivers/gpu/drm/etnaviv/etnaviv_drv.c: of_dma_configure(&pdev->dev, first_node, true); platform_dma_configure() chooses the wrong of_node so this overrides the logic? drivers/gpu/drm/msm/adreno/a6xx_gmu.c: of_dma_configure(gmu->dev, node, true); drivers/gpu/drm/msm/adreno/a6xx_gmu.c: of_dma_configure(gmu->dev, node, true); co-opts a platform device without attaching a driver to it drivers/gpu/drm/sun4i/sun4i_backend.c: ret = of_dma_configure(drm->dev, dev->of_node, true); drivers/gpu/drm/sun4i/sun8i_mixer.c: ret = of_dma_configure(drm->dev, dev->of_node, true); No idea.. Same issue with using the wrong of_node? drivers/gpu/drm/vc4/vc4_drv.c: ret = of_dma_configure(dev, node, true); Similar to adreno drivers/gpu/host1x/bus.c: of_dma_configure(&device->dev, host1x->dev->of_node, true); drivers/gpu/host1x/context.c: err = of_dma_configure_id(&ctx->dev, node, true, &i); Creating a device and then immediately doing of_dma_configure (before registering even). drivers/media/platform/qcom/venus/firmware.c: ret = of_dma_configure(&pdev->dev, np, true); Looks like wrong of_node on the platform_device drivers/media/platform/samsung/s5p-mfc/s5p_mfc.c: of_dma_configure(child, dev->of_node, true); Says it all: /* * The memdevs are not proper OF platform devices, so in order for them * to be treated as valid DMA masters we need a bit of a hack to force * them to inherit the MFC node's DMA configuration. */ of_dma_configure(child, dev->of_node, true); drivers/net/wireless/ath/ath10k/snoc.c: ret = of_dma_configure(&pdev->dev, node, true); Looks like wrong of_node on the platform_device drivers/net/wireless/ath/ath11k/ahb.c: ret = of_dma_configure(&pdev->dev, node, true); Registering a platform device sound/soc/bcm/bcm63xx-pcm-whistler.c: of_dma_configure(pcm->card->dev, pcm->card->dev->of_node, 1); No idea Jason