Re: [PATCH v5 25/32] iommu/mtk: Migrate to aggregate driver

Yong Wu <yong.wu@xxxxxxxxxxxx> · Sat, 15 Jan 2022 15:39:52 +0800

On Fri, 2022-01-14 at 15:30 -0600, Stephen Boyd wrote:
> Quoting Yong Wu (2022-01-14 01:06:31)
> > On Wed, 2022-01-12 at 20:25 -0800, Stephen Boyd wrote:
> > > > 
> > > > [    2.654526] ------------[ cut here ]------------
> > > > [    2.655558] refcount_t: addition on 0; use-after-free.
> > > > 
> > > > After this patch, the aggregate_driver flow looks ok. But our
> > > > driver
> > > > still aborts like this:
> > > > 
> > > > [    2.721316] Unable to handle kernel NULL pointer dereference
> > > > at
> > > > virtual address 0000000000000000
> > > > ...
> > > > [    2.731658] pc :
> > > > mtk_smi_larb_config_port_gen2_general+0xa4/0x138
> > > > [    2.732434] lr : mtk_smi_larb_resume+0x54/0x98
> > > > ...
> > > > [    2.742457] Call trace:
> > > > [    2.742768]  mtk_smi_larb_config_port_gen2_general+0xa4/0x13
> > > > 8
> > > > [    2.743496]  pm_generic_runtime_resume+0x2c/0x48
> > > > [    2.744090]  __genpd_runtime_resume+0x30/0xa8
> > > > [    2.744648]  genpd_runtime_resume+0x94/0x2c8
> > > > [    2.745191]  __rpm_callback+0x44/0x150
> > > > [    2.745669]  rpm_callback+0x6c/0x78
> > > > [    2.746114]  rpm_resume+0x314/0x558
> > > > [    2.746559]  __pm_runtime_resume+0x3c/0x88
> > > > [    2.747080]  pm_runtime_get_suppliers+0x7c/0x110
> > > > [    2.747668]  __driver_probe_device+0x4c/0xe8
> > > > [    2.748212]  driver_probe_device+0x44/0x130
> > > > [    2.748745]  __device_attach_driver+0x98/0xd0
> > > > [    2.749300]  bus_for_each_drv+0x68/0xd0
> > > > [    2.749787]  __device_attach+0xec/0x148
> > > > [    2.750277]  device_attach+0x14/0x20
> > > > [    2.750733]  bus_rescan_devices_helper+0x50/0x90
> > > > [    2.751319]  bus_for_each_dev+0x7c/0xd8
> > > > [    2.751806]  bus_rescan_devices+0x20/0x30
> > > > [    2.752315]  __component_add+0x7c/0xa0
> > > > [    2.752795]  component_add+0x14/0x20
> > > > [    2.753253]  mtk_smi_larb_probe+0xe0/0x120
> > > > 
> > > > This is because the device runtime_resume is called before the
> > > > bind
> > > > operation(In our case this detailed function is
> > > > mtk_smi_larb_bind).
> > > > The issue doesn't happen without this patchset. I'm not sure
> > > > the
> > > > right
> > > > sequence. If we should fix in mediatek driver, the patch could
> > > > be:
> > > 
> > > Oh, the runtime PM is moved around with these patches. The
> > > aggregate
> > > device is runtime PM enabled before the probe is called,
> > 
> > In our case, the component device may probe before the aggregate
> > device. thus the component device runtime PM has already been
> > enabled
> > when aggregate device probe.
> 
> This is always the case. The component device always probes before
> the
> aggregate device because the component device registers with the
> component framework. But the component device can decide to enable
> runtime PM during driver probe or during component bind.
> 
> > 
> > > and there are
> > > supplier links made to each component, so each component is
> > > runtime
> > > resumed before the aggregate probe function is called.
> > 
> > Yes. This is the current flow.
> 
> Got it.
> 
> > 
> > > It means that all
> > > the component drivers need to have their resources ready to power
> > > on
> > > before their component_bind() callback is made.
> > 
> > Sorry, I don't understand here well. In this case, The component
> > drivers prepare the resource for power on in the component_bind
> > since
> > the resource comes from the aggregate driver. Thus, we expect the
> > component_bind run before the runtime resume callback.
> 
> What resource comes from the aggregate device that the component
> device manages in runtime PM?

Here the aggregate device is the iommu device. It knows who enabled the
iommu from the dtsi, then transfers this information to the
component(smi_larb) devices. smi_larb will configure the registers to
enable iommu for these masters in the runtime resume.

> > 
> > Another solution is moving the component's pm_runtime_enable into
> > the
> > component_bind(It's mtk_smi_larb_bind here), then the runtime
> > callback
> > is called after component_bind in which the resource for power on
> > is
> > ready.
> 
> This sounds more correct to me. I'm not an expert on runtime PM
> though
> as I always have to read the code to remember how it works. if the
> device isn't ready for runtime PM until the component bind function
> is
> called then runtime PM shouldn't be enabled on the component device.

Anyway, We should update the SMI driver for this case. I prepare a
patch into this patchset or I send it independently? which way is
better?

The patch could be:

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index b883dcc0bbfa..88854c2270a1 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -175,6 +175,8 @@ mtk_smi_larb_bind(struct device *dev, struct device
*master, void *data)
                        larb->larbid = i;
                        larb->mmu = &larb_mmu[i].mmu;
                        larb->bank = larb_mmu[i].bank;
+
+                       pm_runtime_enable(dev);
                        return 0;
                }
        }
@@ -450,7 +452,6 @@ static int mtk_smi_larb_probe(struct
platform_device *pdev)
        if (ret < 0)
                return ret;

-       pm_runtime_enable(dev);
        platform_set_drvdata(pdev, larb);
        ret = component_add(dev, &mtk_smi_larb_component_ops);
        if (ret)

Thanks.