Re: [PATCH v2 0/4] Fix device_lock deadlock on two probe() paths

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 17, 2023 at 10:31:31AM +0200, Marek Szyprowski wrote:
> Hi Jason,
> 
> On 09.08.2023 16:43, Jason Gunthorpe wrote:
> > I missed two paths where __iommu_probe_device() can be called while
> > already holding the device_lock() for the device that is to be probed.
> >
> > This causes a deadlock because __iommu_probe_device() will attempt to
> > re-acquire the lock.
> >
> > Organize things so that these two paths can re-use the existing already
> > held device_lock by marking the call chains based on if the lock is held
> > or not.
> >
> > Also the order of iommu_init_device() was not correct for
> > generic_single_device_group()
> 
> I've just noticed that there is still one more issue left to fix after 
> applying this patchset. On Qualcomm's RB5 board I get the following 
> warning (captured on vanilla next-20230817):

> Call trace:
>   iommu_probe_device_locked+0xd4/0xe4
>   of_iommu_configure+0x10c/0x200
>   of_dma_configure_id+0x104/0x3b8

So it open codes a call to of_dma_configure_id for some reason

This is a bizzaro thing to do, it is taking the OF handle from a dt:

	node = of_parse_phandle(pdev->dev.of_node, "qcom,gmu", 0);

And the sort of co-opts the platform device it is associated with:

	struct platform_device *pdev = of_find_device_by_node(node);
	gmu->dev = &pdev->dev;
	of_dma_configure(gmu->dev, node, true);

And hackily sets up the iommu? It needs to do this because the iommu
doesn't get setup until something probes on the device. But this code
is just stealing the platform device, it doesn't actually probe a
driver.

So this is all kinds of wrong. Attach a driver if you intend to claim
the device and use it :(

Anyhow, you can "fix" it with this:

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
index 5deb79924897af..8dbd75c200b91c 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
@@ -1556,7 +1556,9 @@ int a6xx_gmu_init(struct a6xx_gpu *a6xx_gpu, struct device_node *node)
 
        gmu->dev = &pdev->dev;
 
+       device_lock(gmu->dev);
        of_dma_configure(gmu->dev, node, true);
+       device_unlock(gmu->dev);
 
        /* Fow now, don't do anything fancy until we get our feet under us */
        gmu->idle_level = GMU_IDLE_STATE_ACTIVE;

But there is much more broken here than just that. Maybe leaving the
warning is a good thing so people can fix this properly. (eg remove
the call to of_dma_configure and use the driver core correctly)

Though looking around I see other places open coding of_dma_configure,
so I suppose this will turn into a persisting issue.

Some of these are not going to throw a warning because they are
adjusting the same device the driver is probing for. However those are
busted up since they try to attach the iommu driver to a device after
the driver core has passed iommu_device_use_default_domain(). It will
explode if the driver is removed.

Bascially.. Yikes!

drivers/bcma/main.c:    of_dma_configure(&core->dev, node, false);

  No idea..

drivers/dma/qcom/hidma_mgmt.c:          of_dma_configure(&new_pdev->dev, child, true);

  Registers a platformdevice then does of_dma_configure(), seems
  wrong. of_dma_configure() should be done when a driver is probed.

drivers/gpu/drm/etnaviv/etnaviv_drv.c:          of_dma_configure(&pdev->dev, first_node, true);

  platform_dma_configure() chooses the wrong of_node so this overrides
  the logic?

drivers/gpu/drm/msm/adreno/a6xx_gmu.c:  of_dma_configure(gmu->dev, node, true);
drivers/gpu/drm/msm/adreno/a6xx_gmu.c:  of_dma_configure(gmu->dev, node, true);

  co-opts a platform device without attaching a driver to it

drivers/gpu/drm/sun4i/sun4i_backend.c:          ret = of_dma_configure(drm->dev, dev->of_node, true);
drivers/gpu/drm/sun4i/sun8i_mixer.c:            ret = of_dma_configure(drm->dev, dev->of_node, true);

  No idea.. Same issue with using the wrong of_node?

drivers/gpu/drm/vc4/vc4_drv.c:          ret = of_dma_configure(dev, node, true);
  
  Similar to adreno

drivers/gpu/host1x/bus.c:       of_dma_configure(&device->dev, host1x->dev->of_node, true);
drivers/gpu/host1x/context.c:           err = of_dma_configure_id(&ctx->dev, node, true, &i);
  
  Creating a device and then immediately doing of_dma_configure
  (before registering even).
  
drivers/media/platform/qcom/venus/firmware.c:   ret = of_dma_configure(&pdev->dev, np, true);

  Looks like wrong of_node on the platform_device

drivers/media/platform/samsung/s5p-mfc/s5p_mfc.c:       of_dma_configure(child, dev->of_node, true);

  Says it all:

	/*
	 * The memdevs are not proper OF platform devices, so in order for them
	 * to be treated as valid DMA masters we need a bit of a hack to force
	 * them to inherit the MFC node's DMA configuration.
	 */
	of_dma_configure(child, dev->of_node, true);

drivers/net/wireless/ath/ath10k/snoc.c: ret = of_dma_configure(&pdev->dev, node, true);

  Looks like wrong of_node on the platform_device
					
drivers/net/wireless/ath/ath11k/ahb.c:  ret = of_dma_configure(&pdev->dev, node, true);

  Registering a platform device

sound/soc/bcm/bcm63xx-pcm-whistler.c:   of_dma_configure(pcm->card->dev, pcm->card->dev->of_node, 1);

  No idea

Jason



[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux