Re: [PATCH] mmc: dw_mmc-exynos: fix potential external abort in resume_noirq()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12 June 2018 at 12:25, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
> Hi Ulf,
>
> On 2018-06-12 12:07, Ulf Hansson wrote:
>> On 12 June 2018 at 11:52, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
>>> On 2018-06-12 11:20, Ulf Hansson wrote:
>>>> On 12 June 2018 at 10:28, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
>>>>> On 2018-06-11 14:24, Ulf Hansson wrote:
>>>>>> On 11 June 2018 at 11:50, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
>>>>>>> On 2018-06-11 11:35, Ulf Hansson wrote:
>>>>>>>> On 11 June 2018 at 08:48, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
>>>>>>>>> dw_mci_exynos_resume_noirq() performs DWMMC register access without
>>>>>>>>> ensuring that respective clocks are enabled. This might cause external
>>>>>>>>> abort on some systems (observed on Exynos5433 based boards). Fix this
>>>>>>>>> by adding needed prepare_enable/disable_unprepare calls.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
>>>>>>>>> ---
>>>>>>>>>      drivers/mmc/host/dw_mmc-exynos.c | 6 ++++++
>>>>>>>>>      1 file changed, 6 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c
>>>>>>>>> index 3164681108ae..6125b68726b0 100644
>>>>>>>>> --- a/drivers/mmc/host/dw_mmc-exynos.c
>>>>>>>>> +++ b/drivers/mmc/host/dw_mmc-exynos.c
>>>>>>>>> @@ -193,6 +193,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>>>>>>>>>             struct dw_mci_exynos_priv_data *priv = host->priv;
>>>>>>>>>             u32 clksel;
>>>>>>>>>
>>>>>>>>> +       clk_prepare_enable(host->biu_clk);
>>>>>>>>> +       clk_prepare_enable(host->ciu_clk);
>>>>>>>>> +
>>>>>>>>>             if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||
>>>>>>>>>                     priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)
>>>>>>>>>                     clksel = mci_readl(host, CLKSEL64);
>>>>>>>>> @@ -207,6 +210,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>>>>>>>>>                             mci_writel(host, CLKSEL, clksel);
>>>>>>>>>             }
>>>>>>>>>
>>>>>>>>> +       clk_disable_unprepare(host->biu_clk);
>>>>>>>>> +       clk_disable_unprepare(host->ciu_clk);
>>>>>>>>> +
>>>>>>>>>             return 0;
>>>>>>>>>      }
>>>>>>>>>      #else
>>>>>>>> I looked a little closer and I am wondering if it wouldn't be possible
>>>>>>>> to use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() instead of
>>>>>>>> SET_SYSTEM_SLEEP_PM_OPS()?
>>>>>>>>
>>>>>>>> Somelike this:
>>>>>>>> SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
>>>>>>>> dw_mci_exynos_resume_noirq)
>>>>>>>>
>>>>>>>> Then from dw_mci_exynos_resume_noirq() call pm_runtime_force_resume().
>>>>>>>>
>>>>>>>> I think it would simplify the code a bit, as you can rely on the
>>>>>>>> runtime PM callbacks to deal with clk_prepare_enable() and
>>>>>>>> clk_disable_unprepare(), unless I am mistaken.
>>>>>>> This will not fix the problem, because mci_writel() calls in
>>>>>>> dw_mci_exynos_resume_noirq are done unconditionally, regardless of the
>>>>>>> controller's runtime pm state. Since commit 1d9174fbc55e after calling
>>>>>>> pm_runtime_force_resume() there is no guarantee that device is in
>>>>>>> runtime active state if it was runtime suspended state.
>>>>>> Yes, because the runtime PM usage count is greater than 1.
>>>>>> (pm_runtime_get_noresume() is called during probe).
>>>>>>
>>>>>> If you want to make this explicit (not relying on ->probe()), one can
>>>>>> add a ->suspend_noirq() callback and call pm_runtime_get_noresume() in
>>>>>> it.
>>>>> Sorry, but I don't get how this would work. Exactly the same pattern as
>>>>> you have proposed was already used in s3c-64xx SPI driver and it didn't
>>>>> work properly (tested on the same SoC as this DW-MMC change). I had to
>>>>> move register access to runtime resume callback to fix external abort
>>>>> issue:
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e935dba111621bd6a0c5d48e6511a4d9885103b4
>>>> Yep, that is a correct solution.
>>>>
>>>>> Here in DW-MMC driver such approach (moving all the code to runtime
>>>>> resume callback) is not possible because of the potential interrupt storm
>>>>> caused by the hw bug (that's the reason of using noirq resume callback).
>>>> I understand. What you need is to run the runtime resume/suspend
>>>> callbacks in the resume/suspend noirq phase. Moreover, you need to
>>>> make sure that the runtime resume callback, really becomes invoked
>>>> during the resume noirq phase, because of the HW bug.
>>>>
>>>> I think the below should work. Can you give it a try?
>>>>
>>>> It relies on the call pm_runtime_get_noresume(), done during
>>>> ->probe(). Note that, the driver always keeps the RPM usage count
>>>> increased, thus preventing runtime suspend during normal execution.
>>>>
>>>> Anyway, if this doesn't work, your suggested approach works fine as well.
>>> Okay, finally I got it. I wasn't aware that dw_mmc-exynos keeps device
>>> runtime active all the time between the driver probe() and remove().
>>> Right, this will fix this specific case, but it isn't a generic solution,
>>> so I will also add a comment on that, so one would not need to debug it
>>> again if he decides to change runtime pm usage scheme in dw_mmc-exynos
>>> in the future.
>> Seems reasonable!
>>
>> If you want the more generic solution, I would add a exynos specific
>> suspend_noirq() callback, let it call pm_runtime_get_noresume() and
>> them pm_runtime_force_suspend().
>>
>> In the corresponding resume_noirq() callback, extend my suggested
>> changes, with a call to pm_runtime_put_noidle() after all actions has
>> been done in it.
>
> Okay, this finally looks like a proper and future-proof solution. Just
> one more question: why pm_runtime_put_noidle()? Hypothetically, when the
> dw_mmc-exynos driver gets proper runtime PM management and system suspend
> will happen when the device is in runtime suspended state, this will leave
> it in runtime active state with refcount = 0 after the system resume cycle.
> Imho simple pm_runtime_put() will be better in this case.

It doesn't really matter.

Runtime PM is disabled for the device (done by driver core) and the PM
workqueue is suspended, which means pm_runtime_put() will not
immediately runtime suspend the device.

The important part is to restore the usage count, such that the driver
core can potentially runtime suspend the device when device_complete()
is called for it.

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux