Hi Bjorn, On? 2017/8/25 4:21, Bjorn Helgaas wrote: > [+cc Tejun, Dmitry, Michael, Stephen, linux-clk for devm/clk questions] > > On Wed, Aug 23, 2017 at 03:02:38PM +0800, Shawn Lin wrote: >> With CONFIG_DEBUG_SHIRQ enabled, the irq tear down routine >> would still access the irq handler registed as a shard irq. >> Per the comment within the function of __free_irq, it says >> "It's a shared IRQ -- the driver ought to be prepared for >> an IRQ event to happen even now it's being freed". However >> when failing to probe the driver, it may disable the clock >> for accessing the register and the following check for shared >> irq state would call the irq handler which accesses the register >> w/o the clk enabled. That will hang the system forever. >> >> With adding some dump_stack we could see how that happened. >> >> calling rockchip_pcie_driver_init+0x0/0x28 @ 1 >> rockchip-pcie f8000000.pcie: no vpcie3v3 regulator found >> rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found >> rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found >> rockchip-pcie f8000000.pcie: PCIe link training gen1 timeout! >> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc3-next-20170807-ARCH+ #189 >> Hardware name: Firefly-RK3399 Board (DT) >> Call trace: >> [<ffff000008089bf0>] dump_backtrace+0x0/0x250 >> [<ffff000008089eb0>] show_stack+0x20/0x28 >> [<ffff000008c3313c>] dump_stack+0x90/0xb0 >> [<ffff000008632ad4>] rockchip_pcie_read.isra.11+0x54/0x58 >> [<ffff0000086334fc>] rockchip_pcie_client_irq_handler+0x30/0x1a0 >> [<ffff00000813ce98>] __free_irq+0x1c8/0x2dc >> [<ffff00000813d044>] free_irq+0x44/0x74 >> [<ffff0000081415fc>] devm_irq_release+0x24/0x2c >> [<ffff00000877429c>] release_nodes+0x1d8/0x30c >> [<ffff000008774838>] devres_release_all+0x3c/0x5c >> [<ffff00000876f19c>] driver_probe_device+0x244/0x494 >> [<ffff00000876f50c>] __driver_attach+0x120/0x124 >> [<ffff00000876cb80>] bus_for_each_dev+0x6c/0xac >> [<ffff00000876e984>] driver_attach+0x2c/0x34 >> [<ffff00000876e3a4>] bus_add_driver+0x244/0x2b0 >> [<ffff000008770264>] driver_register+0x70/0x110 >> [<ffff0000087718b4>] platform_driver_register+0x60/0x6c >> [<ffff0000091eb108>] rockchip_pcie_driver_init+0x20/0x28 >> [<ffff000008083a2c>] do_one_initcall+0xc8/0x130 >> [<ffff0000091a0ea8>] kernel_init_freeable+0x1a0/0x238 >> [<ffff000008c461cc>] kernel_init+0x18/0x108 >> [<ffff0000080836c0>] ret_from_fork+0x10/0x50 >> >> In order to fix this, we remove all the clock-disabling from >> the error handle path and driver's remove function. And replying >> on the devm_add_action_or_reset to fire the clock-disabling at >> the appropriate time. Also split out rockchip_pcie_setup_irq >> and move requesting irq after enabling clks to avoid this kind > > Thanks for splitting out the refactoring stuff. That really makes > this patch much simpler. > > IIUC, this really has nothing to do with CONFIG_DEBUG_SHIRQ. It may > be true that you've only *seen* the problem with CONFIG_DEBUG_SHIRQ > enabled, but all that config option does is take a situation that > could happen at any time (another device sharing the IRQ generating an > interrupt), and force it to happen. So it's just a way to expose an > existing driver problem. Right. > > The real problem is apparently that rockchip_pcie_subsys_irq_handler() > relies on some clock being enabled, but we're leaving it registered at > a time when the clock has already been disabled. > > You fixed that by using devm_add_action_or_reset() to tell devm to > disable the clocks *after* releasing the IRQ. > > That sort of makes sense, but devm_add_action_or_reset() is a little > obscure, and this feels like a hole in the devm framework. Seems like > it would be nice if there were some sort of devm wrapper for > clk_prepare_enable() so this would happen automatically. Yes, I would appreciate it if we have devm wrapper for clk_prepare_enable so that we don't resort to devm_add_action_or_reset. > > This pattern: > > clk = devm_clk_get(...); > if (IS_ERR(clk)) { > dev_warn("no clock for ..."); > return PTR_ERR(clk); > } > > ret = clk_prepare_enable(clk); > if (ret) { > dev_warn("failed to enable ..."); > return err; > } > > is quite common ("git grep -A10 devm_clk_get | grep clk_prepare_enable > | wc -l" finds over 400 occurrences). Should there be something to > simplify this a little? > > I also wonder about other PCI host drivers that use both > clk_prepare_enable() and devm_request_irq(). Maybe Rockchip is > "special" in that it seems the driver must turn on a clock before it > can even talk to the host controller, whereas maybe other drivers can IIRC, some of the other ARM SoCs have the same problem. > always talk to the host controller, but need to turn on clocks > downstream from the controller. I didn't audit them, but I'm > concerned that some of them might have this same problem. So that is my concern as well. But I have to say we may face a worse situation as I see it by randomly search the DT, arch/arm64/boot/dts/renesas/r8a7795.dtsi includes a power-domains for pcie-rcar and pcie-rcar registers shared irq either. So the power- domain would be powered off once failing to probe or calling ->remove() immediately even *before* doing devm cleanup. In another word, I don't have too much confident that renesas's CPU could visit PCIe IP w/o power domain in 'on' state? I posted a relevant patch for fixing this for driver core but havn't got any input from there (https://lkml.org/lkml/2017/8/15/146). That don't affect pcie-rockchip *now* as we don't have power-domain for that but it's highly relevant to the problem we are disscussing. Finally, as a life-saving straw if we don't reach an agreement for anyone of adding devm clk_prepare_enable wrraper and adjusting the sequence of powering off power-domain, we have to get rid of using devm_request_irq and use request_irq/free_irq instead for all the potential problematic drivers... > >> Signed-off-by: Shawn Lin <shawn.lin at rock-chips.com> >> >> --- >> >> Changes in v5: >> - rebase on former reconstrtion patches suggested by Bjorn >> >> Changes in v4: >> - split out rockchip_pcie_enable_clocks and reuse >> rockchip_pcie_enable_clocks and rockchip_pcie_disable_clocks >> for elsewhere suggested by Jeffy >> >> Changes in v3: >> - check the return value of devm_add_action_or_reset and spilt out >> rockchip_pcie_setup_irq in order to move requesting irq after >> enabling clks. >> >> Changes in v2: >> - use devm_add_action_or_reset to fix this ordering suggested by >> Heiko and Jeffy. Thanks! >> >> drivers/pci/host/pcie-rockchip.c | 22 +++++++++++++--------- >> 1 file changed, 13 insertions(+), 9 deletions(-) >> >> diff --git a/drivers/pci/host/pcie-rockchip.c b/drivers/pci/host/pcie-rockchip.c >> index 971d22b..891b60a 100644 >> --- a/drivers/pci/host/pcie-rockchip.c >> +++ b/drivers/pci/host/pcie-rockchip.c >> @@ -1099,10 +1099,6 @@ static int rockchip_pcie_parse_dt(struct rockchip_pcie *rockchip) >> return PTR_ERR(rockchip->clk_pcie_pm); >> } >> >> - err = rockchip_pcie_setup_irq(rockchip); >> - if (err) >> - return err; >> - >> rockchip->vpcie12v = devm_regulator_get_optional(dev, "vpcie12v"); >> if (IS_ERR(rockchip->vpcie12v)) { >> if (PTR_ERR(rockchip->vpcie12v) == -EPROBE_DEFER) >> @@ -1525,10 +1521,22 @@ static int rockchip_pcie_probe(struct platform_device *pdev) >> if (err) >> return err; >> >> + err = devm_add_action_or_reset(dev, >> + rockchip_pcie_disable_clocks, >> + rockchip); >> + if (err) { >> + dev_err(dev, "unable to add action or reset\n"); >> + return err; >> + } >> + >> + err = rockchip_pcie_setup_irq(rockchip); >> + if (err) >> + return err; >> + >> err = rockchip_pcie_set_vpcie(rockchip); >> if (err) { >> dev_err(dev, "failed to set vpcie regulator\n"); >> - goto err_set_vpcie; >> + return err; >> } >> >> err = rockchip_pcie_init_port(rockchip); >> @@ -1625,8 +1633,6 @@ static int rockchip_pcie_probe(struct platform_device *pdev) >> regulator_disable(rockchip->vpcie1v8); >> if (!IS_ERR(rockchip->vpcie0v9)) >> regulator_disable(rockchip->vpcie0v9); >> -err_set_vpcie: >> - rockchip_pcie_disable_clocks(rockchip); >> return err; >> } >> >> @@ -1648,8 +1654,6 @@ static int rockchip_pcie_remove(struct platform_device *pdev) >> phy_exit(rockchip->phys[i]); >> } >> >> - rockchip_pcie_disable_clocks(rockchip); >> - >> if (!IS_ERR(rockchip->vpcie12v)) >> regulator_disable(rockchip->vpcie12v); >> if (!IS_ERR(rockchip->vpcie3v3)) >> -- >> 1.9.1 >> >> > > >