Re: [PATCH] mmc: dw_mmc-exynos: fix potential external abort in resume_noirq()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ulf,

On 2018-06-12 11:20, Ulf Hansson wrote:
> On 12 June 2018 at 10:28, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
>> On 2018-06-11 14:24, Ulf Hansson wrote:
>>> On 11 June 2018 at 11:50, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
>>>> On 2018-06-11 11:35, Ulf Hansson wrote:
>>>>> On 11 June 2018 at 08:48, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
>>>>>> dw_mci_exynos_resume_noirq() performs DWMMC register access without
>>>>>> ensuring that respective clocks are enabled. This might cause external
>>>>>> abort on some systems (observed on Exynos5433 based boards). Fix this
>>>>>> by adding needed prepare_enable/disable_unprepare calls.
>>>>>>
>>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
>>>>>> ---
>>>>>>     drivers/mmc/host/dw_mmc-exynos.c | 6 ++++++
>>>>>>     1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c
>>>>>> index 3164681108ae..6125b68726b0 100644
>>>>>> --- a/drivers/mmc/host/dw_mmc-exynos.c
>>>>>> +++ b/drivers/mmc/host/dw_mmc-exynos.c
>>>>>> @@ -193,6 +193,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>>>>>>            struct dw_mci_exynos_priv_data *priv = host->priv;
>>>>>>            u32 clksel;
>>>>>>
>>>>>> +       clk_prepare_enable(host->biu_clk);
>>>>>> +       clk_prepare_enable(host->ciu_clk);
>>>>>> +
>>>>>>            if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||
>>>>>>                    priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)
>>>>>>                    clksel = mci_readl(host, CLKSEL64);
>>>>>> @@ -207,6 +210,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>>>>>>                            mci_writel(host, CLKSEL, clksel);
>>>>>>            }
>>>>>>
>>>>>> +       clk_disable_unprepare(host->biu_clk);
>>>>>> +       clk_disable_unprepare(host->ciu_clk);
>>>>>> +
>>>>>>            return 0;
>>>>>>     }
>>>>>>     #else
>>>>> I looked a little closer and I am wondering if it wouldn't be possible
>>>>> to use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() instead of
>>>>> SET_SYSTEM_SLEEP_PM_OPS()?
>>>>>
>>>>> Somelike this:
>>>>> SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
>>>>> dw_mci_exynos_resume_noirq)
>>>>>
>>>>> Then from dw_mci_exynos_resume_noirq() call pm_runtime_force_resume().
>>>>>
>>>>> I think it would simplify the code a bit, as you can rely on the
>>>>> runtime PM callbacks to deal with clk_prepare_enable() and
>>>>> clk_disable_unprepare(), unless I am mistaken.
>>>> This will not fix the problem, because mci_writel() calls in
>>>> dw_mci_exynos_resume_noirq are done unconditionally, regardless of the
>>>> controller's runtime pm state. Since commit 1d9174fbc55e after calling
>>>> pm_runtime_force_resume() there is no guarantee that device is in
>>>> runtime active state if it was runtime suspended state.
>>> Yes, because the runtime PM usage count is greater than 1.
>>> (pm_runtime_get_noresume() is called during probe).
>>>
>>> If you want to make this explicit (not relying on ->probe()), one can
>>> add a ->suspend_noirq() callback and call pm_runtime_get_noresume() in
>>> it.
>> Sorry, but I don't get how this would work. Exactly the same pattern as
>> you have proposed was already used in s3c-64xx SPI driver and it didn't
>> work properly (tested on the same SoC as this DW-MMC change). I had to
>> move register access to runtime resume callback to fix external abort
>> issue:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e935dba111621bd6a0c5d48e6511a4d9885103b4
> Yep, that is a correct solution.
>
>> Here in DW-MMC driver such approach (moving all the code to runtime
>> resume callback) is not possible because of the potential interrupt storm
>> caused by the hw bug (that's the reason of using noirq resume callback).
> I understand. What you need is to run the runtime resume/suspend
> callbacks in the resume/suspend noirq phase. Moreover, you need to
> make sure that the runtime resume callback, really becomes invoked
> during the resume noirq phase, because of the HW bug.
>
> I think the below should work. Can you give it a try?
>
> It relies on the call pm_runtime_get_noresume(), done during
> ->probe(). Note that, the driver always keeps the RPM usage count
> increased, thus preventing runtime suspend during normal execution.
>
> Anyway, if this doesn't work, your suggested approach works fine as well.

Okay, finally I got it. I wasn't aware that dw_mmc-exynos keeps device
runtime active all the time between the driver probe() and remove().
Right, this will fix this specific case, but it isn't a generic solution,
so I will also add a comment on that, so one would not need to debug it
again if he decides to change runtime pm usage scheme in dw_mmc-exynos
in the future.

>
> ---
>   drivers/mmc/host/dw_mmc-exynos.c | 15 +++++++--------
>   1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c
> index a84aa3f1ae85..66132f7fceed 100644
> --- a/drivers/mmc/host/dw_mmc-exynos.c
> +++ b/drivers/mmc/host/dw_mmc-exynos.c
> @@ -175,7 +175,9 @@ static int dw_mci_exynos_runtime_resume(struct device *dev)
>
>          return ret;
>   }
> +#endif /* CONFIG_PM */
>
> +#ifdef CONFIG_PM_SLEEP
>   /**
>    * dw_mci_exynos_resume_noirq - Exynos-specific resume code
>    *
> @@ -193,6 +195,8 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>          struct dw_mci_exynos_priv_data *priv = host->priv;
>          u32 clksel;
>
> +       pm_runtime_force_resume(dev);
> +
>          if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||
>                  priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)
>                  clksel = mci_readl(host, CLKSEL64);
> @@ -209,9 +213,7 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>
>          return 0;
>   }
> -#else
> -#define dw_mci_exynos_resume_noirq     NULL
> -#endif /* CONFIG_PM */
> +#endif /* CONFIG_PM_SLEEP */
>
>   static void dw_mci_exynos_config_hs400(struct dw_mci *host, u32 timing)
>   {
> @@ -553,14 +555,11 @@ static int dw_mci_exynos_remove(struct
> platform_device *pdev)
>   }
>   static const struct dev_pm_ops dw_mci_exynos_pmops = {
> -       SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
> -                               pm_runtime_force_resume)
> +       SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
> +                               dw_mci_exynos_resume_noirq)
>          SET_RUNTIME_PM_OPS(dw_mci_runtime_suspend,
>                             dw_mci_exynos_runtime_resume,
>                             NULL)
> -       .resume_noirq = dw_mci_exynos_resume_noirq,
> -       .thaw_noirq = dw_mci_exynos_resume_noirq,
> -       .restore_noirq = dw_mci_exynos_resume_noirq,
>   };
>
>   static struct platform_driver dw_mci_exynos_pltfm_driver = {

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux