Re: [PATCH] mmc: dw_mmc-exynos: fix potential external abort in resume_noirq()

Ulf Hansson <ulf.hansson@xxxxxxxxxx> · Tue, 12 Jun 2018 12:07:03 +0200

On 12 June 2018 at 11:52, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
> Hi Ulf,
>
> On 2018-06-12 11:20, Ulf Hansson wrote:
>> On 12 June 2018 at 10:28, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
>>> On 2018-06-11 14:24, Ulf Hansson wrote:
>>>> On 11 June 2018 at 11:50, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
>>>>> On 2018-06-11 11:35, Ulf Hansson wrote:
>>>>>> On 11 June 2018 at 08:48, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
>>>>>>> dw_mci_exynos_resume_noirq() performs DWMMC register access without
>>>>>>> ensuring that respective clocks are enabled. This might cause external
>>>>>>> abort on some systems (observed on Exynos5433 based boards). Fix this
>>>>>>> by adding needed prepare_enable/disable_unprepare calls.
>>>>>>>
>>>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
>>>>>>> ---
>>>>>>>     drivers/mmc/host/dw_mmc-exynos.c | 6 ++++++
>>>>>>>     1 file changed, 6 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c
>>>>>>> index 3164681108ae..6125b68726b0 100644
>>>>>>> --- a/drivers/mmc/host/dw_mmc-exynos.c
>>>>>>> +++ b/drivers/mmc/host/dw_mmc-exynos.c
>>>>>>> @@ -193,6 +193,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>>>>>>>            struct dw_mci_exynos_priv_data *priv = host->priv;
>>>>>>>            u32 clksel;
>>>>>>>
>>>>>>> +       clk_prepare_enable(host->biu_clk);
>>>>>>> +       clk_prepare_enable(host->ciu_clk);
>>>>>>> +
>>>>>>>            if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||
>>>>>>>                    priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)
>>>>>>>                    clksel = mci_readl(host, CLKSEL64);
>>>>>>> @@ -207,6 +210,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>>>>>>>                            mci_writel(host, CLKSEL, clksel);
>>>>>>>            }
>>>>>>>
>>>>>>> +       clk_disable_unprepare(host->biu_clk);
>>>>>>> +       clk_disable_unprepare(host->ciu_clk);
>>>>>>> +
>>>>>>>            return 0;
>>>>>>>     }
>>>>>>>     #else
>>>>>> I looked a little closer and I am wondering if it wouldn't be possible
>>>>>> to use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() instead of
>>>>>> SET_SYSTEM_SLEEP_PM_OPS()?
>>>>>>
>>>>>> Somelike this:
>>>>>> SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
>>>>>> dw_mci_exynos_resume_noirq)
>>>>>>
>>>>>> Then from dw_mci_exynos_resume_noirq() call pm_runtime_force_resume().
>>>>>>
>>>>>> I think it would simplify the code a bit, as you can rely on the
>>>>>> runtime PM callbacks to deal with clk_prepare_enable() and
>>>>>> clk_disable_unprepare(), unless I am mistaken.
>>>>> This will not fix the problem, because mci_writel() calls in
>>>>> dw_mci_exynos_resume_noirq are done unconditionally, regardless of the
>>>>> controller's runtime pm state. Since commit 1d9174fbc55e after calling
>>>>> pm_runtime_force_resume() there is no guarantee that device is in
>>>>> runtime active state if it was runtime suspended state.
>>>> Yes, because the runtime PM usage count is greater than 1.
>>>> (pm_runtime_get_noresume() is called during probe).
>>>>
>>>> If you want to make this explicit (not relying on ->probe()), one can
>>>> add a ->suspend_noirq() callback and call pm_runtime_get_noresume() in
>>>> it.
>>> Sorry, but I don't get how this would work. Exactly the same pattern as
>>> you have proposed was already used in s3c-64xx SPI driver and it didn't
>>> work properly (tested on the same SoC as this DW-MMC change). I had to
>>> move register access to runtime resume callback to fix external abort
>>> issue:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e935dba111621bd6a0c5d48e6511a4d9885103b4
>> Yep, that is a correct solution.
>>
>>> Here in DW-MMC driver such approach (moving all the code to runtime
>>> resume callback) is not possible because of the potential interrupt storm
>>> caused by the hw bug (that's the reason of using noirq resume callback).
>> I understand. What you need is to run the runtime resume/suspend
>> callbacks in the resume/suspend noirq phase. Moreover, you need to
>> make sure that the runtime resume callback, really becomes invoked
>> during the resume noirq phase, because of the HW bug.
>>
>> I think the below should work. Can you give it a try?
>>
>> It relies on the call pm_runtime_get_noresume(), done during
>> ->probe(). Note that, the driver always keeps the RPM usage count
>> increased, thus preventing runtime suspend during normal execution.
>>
>> Anyway, if this doesn't work, your suggested approach works fine as well.
>
> Okay, finally I got it. I wasn't aware that dw_mmc-exynos keeps device
> runtime active all the time between the driver probe() and remove().
> Right, this will fix this specific case, but it isn't a generic solution,
> so I will also add a comment on that, so one would not need to debug it
> again if he decides to change runtime pm usage scheme in dw_mmc-exynos
> in the future.

Seems reasonable!

If you want the more generic solution, I would add a exynos specific
suspend_noirq() callback, let it call pm_runtime_get_noresume() and
them pm_runtime_force_suspend().

In the corresponding resume_noirq() callback, extend my suggested
changes, with a call to pm_runtime_put_noidle() after all actions has
been done in it.

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html