Re: [PATCH] mmc: dw_mmc-exynos: fix potential external abort in resume_noirq()

Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> · Tue, 12 Jun 2018 12:25:12 +0200

Hi Ulf,

On 2018-06-12 12:07, Ulf Hansson wrote:
> On 12 June 2018 at 11:52, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
>> On 2018-06-12 11:20, Ulf Hansson wrote:
>>> On 12 June 2018 at 10:28, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
>>>> On 2018-06-11 14:24, Ulf Hansson wrote:
>>>>> On 11 June 2018 at 11:50, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
>>>>>> On 2018-06-11 11:35, Ulf Hansson wrote:
>>>>>>> On 11 June 2018 at 08:48, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
>>>>>>>> dw_mci_exynos_resume_noirq() performs DWMMC register access without
>>>>>>>> ensuring that respective clocks are enabled. This might cause external
>>>>>>>> abort on some systems (observed on Exynos5433 based boards). Fix this
>>>>>>>> by adding needed prepare_enable/disable_unprepare calls.
>>>>>>>>
>>>>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
>>>>>>>> ---
>>>>>>>>      drivers/mmc/host/dw_mmc-exynos.c | 6 ++++++
>>>>>>>>      1 file changed, 6 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c
>>>>>>>> index 3164681108ae..6125b68726b0 100644
>>>>>>>> --- a/drivers/mmc/host/dw_mmc-exynos.c
>>>>>>>> +++ b/drivers/mmc/host/dw_mmc-exynos.c
>>>>>>>> @@ -193,6 +193,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>>>>>>>>             struct dw_mci_exynos_priv_data *priv = host->priv;
>>>>>>>>             u32 clksel;
>>>>>>>>
>>>>>>>> +       clk_prepare_enable(host->biu_clk);
>>>>>>>> +       clk_prepare_enable(host->ciu_clk);
>>>>>>>> +
>>>>>>>>             if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||
>>>>>>>>                     priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)
>>>>>>>>                     clksel = mci_readl(host, CLKSEL64);
>>>>>>>> @@ -207,6 +210,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>>>>>>>>                             mci_writel(host, CLKSEL, clksel);
>>>>>>>>             }
>>>>>>>>
>>>>>>>> +       clk_disable_unprepare(host->biu_clk);
>>>>>>>> +       clk_disable_unprepare(host->ciu_clk);
>>>>>>>> +
>>>>>>>>             return 0;
>>>>>>>>      }
>>>>>>>>      #else
>>>>>>> I looked a little closer and I am wondering if it wouldn't be possible
>>>>>>> to use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() instead of
>>>>>>> SET_SYSTEM_SLEEP_PM_OPS()?
>>>>>>>
>>>>>>> Somelike this:
>>>>>>> SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
>>>>>>> dw_mci_exynos_resume_noirq)
>>>>>>>
>>>>>>> Then from dw_mci_exynos_resume_noirq() call pm_runtime_force_resume().
>>>>>>>
>>>>>>> I think it would simplify the code a bit, as you can rely on the
>>>>>>> runtime PM callbacks to deal with clk_prepare_enable() and
>>>>>>> clk_disable_unprepare(), unless I am mistaken.
>>>>>> This will not fix the problem, because mci_writel() calls in
>>>>>> dw_mci_exynos_resume_noirq are done unconditionally, regardless of the
>>>>>> controller's runtime pm state. Since commit 1d9174fbc55e after calling
>>>>>> pm_runtime_force_resume() there is no guarantee that device is in
>>>>>> runtime active state if it was runtime suspended state.
>>>>> Yes, because the runtime PM usage count is greater than 1.
>>>>> (pm_runtime_get_noresume() is called during probe).
>>>>>
>>>>> If you want to make this explicit (not relying on ->probe()), one can
>>>>> add a ->suspend_noirq() callback and call pm_runtime_get_noresume() in
>>>>> it.
>>>> Sorry, but I don't get how this would work. Exactly the same pattern as
>>>> you have proposed was already used in s3c-64xx SPI driver and it didn't
>>>> work properly (tested on the same SoC as this DW-MMC change). I had to
>>>> move register access to runtime resume callback to fix external abort
>>>> issue:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e935dba111621bd6a0c5d48e6511a4d9885103b4
>>> Yep, that is a correct solution.
>>>
>>>> Here in DW-MMC driver such approach (moving all the code to runtime
>>>> resume callback) is not possible because of the potential interrupt storm
>>>> caused by the hw bug (that's the reason of using noirq resume callback).
>>> I understand. What you need is to run the runtime resume/suspend
>>> callbacks in the resume/suspend noirq phase. Moreover, you need to
>>> make sure that the runtime resume callback, really becomes invoked
>>> during the resume noirq phase, because of the HW bug.
>>>
>>> I think the below should work. Can you give it a try?
>>>
>>> It relies on the call pm_runtime_get_noresume(), done during
>>> ->probe(). Note that, the driver always keeps the RPM usage count
>>> increased, thus preventing runtime suspend during normal execution.
>>>
>>> Anyway, if this doesn't work, your suggested approach works fine as well.
>> Okay, finally I got it. I wasn't aware that dw_mmc-exynos keeps device
>> runtime active all the time between the driver probe() and remove().
>> Right, this will fix this specific case, but it isn't a generic solution,
>> so I will also add a comment on that, so one would not need to debug it
>> again if he decides to change runtime pm usage scheme in dw_mmc-exynos
>> in the future.
> Seems reasonable!
>
> If you want the more generic solution, I would add a exynos specific
> suspend_noirq() callback, let it call pm_runtime_get_noresume() and
> them pm_runtime_force_suspend().
>
> In the corresponding resume_noirq() callback, extend my suggested
> changes, with a call to pm_runtime_put_noidle() after all actions has
> been done in it.

Okay, this finally looks like a proper and future-proof solution. Just
one more question: why pm_runtime_put_noidle()? Hypothetically, when the
dw_mmc-exynos driver gets proper runtime PM management and system suspend
will happen when the device is in runtime suspended state, this will leave
it in runtime active state with refcount = 0 after the system resume cycle.
Imho simple pm_runtime_put() will be better in this case.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html