[PATCH v5 04/10] PCI: rockchip: fix system hang up if activating CONFIG_DEBUG_SHIRQ

shawn.lin@xxxxxxxxxxxxxx (Shawn Lin) · Fri, 25 Aug 2017 09:38:56 +0800

Hi Bjorn,

On? 2017/8/25 4:21, Bjorn Helgaas wrote:
> [+cc Tejun, Dmitry, Michael, Stephen, linux-clk for devm/clk questions]
> 
> On Wed, Aug 23, 2017 at 03:02:38PM +0800, Shawn Lin wrote:
>> With CONFIG_DEBUG_SHIRQ enabled, the irq tear down routine
>> would still access the irq handler registed as a shard irq.
>> Per the comment within the function of __free_irq, it says
>> "It's a shared IRQ -- the driver ought to be prepared for
>> an IRQ event to happen even now it's being freed". However
>> when failing to probe the driver, it may disable the clock
>> for accessing the register and the following check for shared
>> irq state would call the irq handler which accesses the register
>> w/o the clk enabled. That will hang the system forever.
>>
>> With adding some dump_stack we could see how that happened.
>>
>> calling  rockchip_pcie_driver_init+0x0/0x28 @ 1
>> rockchip-pcie f8000000.pcie: no vpcie3v3 regulator found
>> rockchip-pcie f8000000.pcie: no vpcie1v8 regulator found
>> rockchip-pcie f8000000.pcie: no vpcie0v9 regulator found
>> rockchip-pcie f8000000.pcie: PCIe link training gen1 timeout!
>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc3-next-20170807-ARCH+ #189
>> Hardware name: Firefly-RK3399 Board (DT)
>> Call trace:
>> [<ffff000008089bf0>] dump_backtrace+0x0/0x250
>> [<ffff000008089eb0>] show_stack+0x20/0x28
>> [<ffff000008c3313c>] dump_stack+0x90/0xb0
>> [<ffff000008632ad4>] rockchip_pcie_read.isra.11+0x54/0x58
>> [<ffff0000086334fc>] rockchip_pcie_client_irq_handler+0x30/0x1a0
>> [<ffff00000813ce98>] __free_irq+0x1c8/0x2dc
>> [<ffff00000813d044>] free_irq+0x44/0x74
>> [<ffff0000081415fc>] devm_irq_release+0x24/0x2c
>> [<ffff00000877429c>] release_nodes+0x1d8/0x30c
>> [<ffff000008774838>] devres_release_all+0x3c/0x5c
>> [<ffff00000876f19c>] driver_probe_device+0x244/0x494
>> [<ffff00000876f50c>] __driver_attach+0x120/0x124
>> [<ffff00000876cb80>] bus_for_each_dev+0x6c/0xac
>> [<ffff00000876e984>] driver_attach+0x2c/0x34
>> [<ffff00000876e3a4>] bus_add_driver+0x244/0x2b0
>> [<ffff000008770264>] driver_register+0x70/0x110
>> [<ffff0000087718b4>] platform_driver_register+0x60/0x6c
>> [<ffff0000091eb108>] rockchip_pcie_driver_init+0x20/0x28
>> [<ffff000008083a2c>] do_one_initcall+0xc8/0x130
>> [<ffff0000091a0ea8>] kernel_init_freeable+0x1a0/0x238
>> [<ffff000008c461cc>] kernel_init+0x18/0x108
>> [<ffff0000080836c0>] ret_from_fork+0x10/0x50
>>
>> In order to fix this, we remove all the clock-disabling from
>> the error handle path and driver's remove function. And replying
>> on the devm_add_action_or_reset to fire the clock-disabling at
>> the appropriate time. Also split out rockchip_pcie_setup_irq
>> and move requesting irq after enabling clks to avoid this kind
> 
> Thanks for splitting out the refactoring stuff.  That really makes
> this patch much simpler.
> 
> IIUC, this really has nothing to do with CONFIG_DEBUG_SHIRQ.  It may
> be true that you've only *seen* the problem with CONFIG_DEBUG_SHIRQ
> enabled, but all that config option does is take a situation that
> could happen at any time (another device sharing the IRQ generating an
> interrupt), and force it to happen.  So it's just a way to expose an
> existing driver problem.

Right.

> 
> The real problem is apparently that rockchip_pcie_subsys_irq_handler()
> relies on some clock being enabled, but we're leaving it registered at
> a time when the clock has already been disabled.
> 
> You fixed that by using devm_add_action_or_reset() to tell devm to
> disable the clocks *after* releasing the IRQ.
> 
> That sort of makes sense, but devm_add_action_or_reset() is a little
> obscure, and this feels like a hole in the devm framework.  Seems like
> it would be nice if there were some sort of devm wrapper for
> clk_prepare_enable() so this would happen automatically.

Yes, I would appreciate it if we have devm wrapper for
clk_prepare_enable so that we don't resort to devm_add_action_or_reset.

> 
> This pattern:
> 
>    clk = devm_clk_get(...);
>    if (IS_ERR(clk)) {
>      dev_warn("no clock for ...");
>      return PTR_ERR(clk);
>    }
> 
>    ret = clk_prepare_enable(clk);
>    if (ret) {
>      dev_warn("failed to enable ...");
>      return err;
>    }
> 
> is quite common ("git grep -A10 devm_clk_get | grep clk_prepare_enable
>   | wc -l" finds over 400 occurrences).  Should there be something to
> simplify this a little?
> 
> I also wonder about other PCI host drivers that use both
> clk_prepare_enable() and devm_request_irq().  Maybe Rockchip is
> "special" in that it seems the driver must turn on a clock before it
> can even talk to the host controller, whereas maybe other drivers can

IIRC, some of the other ARM SoCs have the same problem.

> always talk to the host controller, but need to turn on clocks
> downstream from the controller.  I didn't audit them, but I'm
> concerned that some of them might have this same problem.

So that is my concern as well. But I have to say we may face a worse
situation as I see it by randomly search the DT,

arch/arm64/boot/dts/renesas/r8a7795.dtsi  includes a power-domains
for pcie-rcar and pcie-rcar registers shared irq either. So the power-
domain would be powered off once failing to probe or calling ->remove()
immediately even *before* doing devm cleanup. In another word, I don't
have too much confident that renesas's CPU could visit PCIe IP w/o power
domain in 'on' state?

I posted a relevant patch for fixing this for driver core but havn't got
any input from there (https://lkml.org/lkml/2017/8/15/146). That don't
affect pcie-rockchip *now* as we don't have power-domain for that but
it's highly relevant to the problem we are disscussing.

Finally, as a life-saving straw if we don't reach an agreement for
anyone of adding devm clk_prepare_enable wrraper and adjusting the
sequence of powering off power-domain, we have to get rid of using
devm_request_irq and use request_irq/free_irq instead for all
the potential problematic drivers...

> 
>> Signed-off-by: Shawn Lin <shawn.lin at rock-chips.com>
>>
>> ---
>>
>> Changes in v5:
>> - rebase on former reconstrtion patches suggested by Bjorn
>>
>> Changes in v4:
>> - split out rockchip_pcie_enable_clocks and reuse
>>    rockchip_pcie_enable_clocks and rockchip_pcie_disable_clocks
>>    for elsewhere suggested by Jeffy
>>
>> Changes in v3:
>> - check the return value of devm_add_action_or_reset and spilt out
>>    rockchip_pcie_setup_irq in order to move requesting irq after
>>    enabling clks.
>>
>> Changes in v2:
>> - use devm_add_action_or_reset to fix this ordering suggested by
>>    Heiko and Jeffy. Thanks!
>>
>>   drivers/pci/host/pcie-rockchip.c | 22 +++++++++++++---------
>>   1 file changed, 13 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/pci/host/pcie-rockchip.c b/drivers/pci/host/pcie-rockchip.c
>> index 971d22b..891b60a 100644
>> --- a/drivers/pci/host/pcie-rockchip.c
>> +++ b/drivers/pci/host/pcie-rockchip.c
>> @@ -1099,10 +1099,6 @@ static int rockchip_pcie_parse_dt(struct rockchip_pcie *rockchip)
>>   		return PTR_ERR(rockchip->clk_pcie_pm);
>>   	}
>>   
>> -	err = rockchip_pcie_setup_irq(rockchip);
>> -	if (err)
>> -		return err;
>> -
>>   	rockchip->vpcie12v = devm_regulator_get_optional(dev, "vpcie12v");
>>   	if (IS_ERR(rockchip->vpcie12v)) {
>>   		if (PTR_ERR(rockchip->vpcie12v) == -EPROBE_DEFER)
>> @@ -1525,10 +1521,22 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
>>   	if (err)
>>   		return err;
>>   
>> +	err = devm_add_action_or_reset(dev,
>> +				       rockchip_pcie_disable_clocks,
>> +				       rockchip);
>> +	if (err) {
>> +		dev_err(dev, "unable to add action or reset\n");
>> +		return err;
>> +	}
>> +
>> +	err = rockchip_pcie_setup_irq(rockchip);
>> +	if (err)
>> +		return err;
>> +
>>   	err = rockchip_pcie_set_vpcie(rockchip);
>>   	if (err) {
>>   		dev_err(dev, "failed to set vpcie regulator\n");
>> -		goto err_set_vpcie;
>> +		return err;
>>   	}
>>   
>>   	err = rockchip_pcie_init_port(rockchip);
>> @@ -1625,8 +1633,6 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
>>   		regulator_disable(rockchip->vpcie1v8);
>>   	if (!IS_ERR(rockchip->vpcie0v9))
>>   		regulator_disable(rockchip->vpcie0v9);
>> -err_set_vpcie:
>> -	rockchip_pcie_disable_clocks(rockchip);
>>   	return err;
>>   }
>>   
>> @@ -1648,8 +1654,6 @@ static int rockchip_pcie_remove(struct platform_device *pdev)
>>   		phy_exit(rockchip->phys[i]);
>>   	}
>>   
>> -	rockchip_pcie_disable_clocks(rockchip);
>> -
>>   	if (!IS_ERR(rockchip->vpcie12v))
>>   		regulator_disable(rockchip->vpcie12v);
>>   	if (!IS_ERR(rockchip->vpcie3v3))
>> -- 
>> 1.9.1
>>
>>
> 
> 
>