Hi Ulf, On 2018-06-12 12:07, Ulf Hansson wrote: > On 12 June 2018 at 11:52, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote: >> On 2018-06-12 11:20, Ulf Hansson wrote: >>> On 12 June 2018 at 10:28, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote: >>>> On 2018-06-11 14:24, Ulf Hansson wrote: >>>>> On 11 June 2018 at 11:50, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote: >>>>>> On 2018-06-11 11:35, Ulf Hansson wrote: >>>>>>> On 11 June 2018 at 08:48, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote: >>>>>>>> dw_mci_exynos_resume_noirq() performs DWMMC register access without >>>>>>>> ensuring that respective clocks are enabled. This might cause external >>>>>>>> abort on some systems (observed on Exynos5433 based boards). Fix this >>>>>>>> by adding needed prepare_enable/disable_unprepare calls. >>>>>>>> >>>>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> >>>>>>>> --- >>>>>>>> drivers/mmc/host/dw_mmc-exynos.c | 6 ++++++ >>>>>>>> 1 file changed, 6 insertions(+) >>>>>>>> >>>>>>>> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c >>>>>>>> index 3164681108ae..6125b68726b0 100644 >>>>>>>> --- a/drivers/mmc/host/dw_mmc-exynos.c >>>>>>>> +++ b/drivers/mmc/host/dw_mmc-exynos.c >>>>>>>> @@ -193,6 +193,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev) >>>>>>>> struct dw_mci_exynos_priv_data *priv = host->priv; >>>>>>>> u32 clksel; >>>>>>>> >>>>>>>> + clk_prepare_enable(host->biu_clk); >>>>>>>> + clk_prepare_enable(host->ciu_clk); >>>>>>>> + >>>>>>>> if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 || >>>>>>>> priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU) >>>>>>>> clksel = mci_readl(host, CLKSEL64); >>>>>>>> @@ -207,6 +210,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev) >>>>>>>> mci_writel(host, CLKSEL, clksel); >>>>>>>> } >>>>>>>> >>>>>>>> + clk_disable_unprepare(host->biu_clk); >>>>>>>> + clk_disable_unprepare(host->ciu_clk); >>>>>>>> + >>>>>>>> return 0; >>>>>>>> } >>>>>>>> #else >>>>>>> I looked a little closer and I am wondering if it wouldn't be possible >>>>>>> to use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() instead of >>>>>>> SET_SYSTEM_SLEEP_PM_OPS()? >>>>>>> >>>>>>> Somelike this: >>>>>>> SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, >>>>>>> dw_mci_exynos_resume_noirq) >>>>>>> >>>>>>> Then from dw_mci_exynos_resume_noirq() call pm_runtime_force_resume(). >>>>>>> >>>>>>> I think it would simplify the code a bit, as you can rely on the >>>>>>> runtime PM callbacks to deal with clk_prepare_enable() and >>>>>>> clk_disable_unprepare(), unless I am mistaken. >>>>>> This will not fix the problem, because mci_writel() calls in >>>>>> dw_mci_exynos_resume_noirq are done unconditionally, regardless of the >>>>>> controller's runtime pm state. Since commit 1d9174fbc55e after calling >>>>>> pm_runtime_force_resume() there is no guarantee that device is in >>>>>> runtime active state if it was runtime suspended state. >>>>> Yes, because the runtime PM usage count is greater than 1. >>>>> (pm_runtime_get_noresume() is called during probe). >>>>> >>>>> If you want to make this explicit (not relying on ->probe()), one can >>>>> add a ->suspend_noirq() callback and call pm_runtime_get_noresume() in >>>>> it. >>>> Sorry, but I don't get how this would work. Exactly the same pattern as >>>> you have proposed was already used in s3c-64xx SPI driver and it didn't >>>> work properly (tested on the same SoC as this DW-MMC change). I had to >>>> move register access to runtime resume callback to fix external abort >>>> issue: >>>> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e935dba111621bd6a0c5d48e6511a4d9885103b4 >>> Yep, that is a correct solution. >>> >>>> Here in DW-MMC driver such approach (moving all the code to runtime >>>> resume callback) is not possible because of the potential interrupt storm >>>> caused by the hw bug (that's the reason of using noirq resume callback). >>> I understand. What you need is to run the runtime resume/suspend >>> callbacks in the resume/suspend noirq phase. Moreover, you need to >>> make sure that the runtime resume callback, really becomes invoked >>> during the resume noirq phase, because of the HW bug. >>> >>> I think the below should work. Can you give it a try? >>> >>> It relies on the call pm_runtime_get_noresume(), done during >>> ->probe(). Note that, the driver always keeps the RPM usage count >>> increased, thus preventing runtime suspend during normal execution. >>> >>> Anyway, if this doesn't work, your suggested approach works fine as well. >> Okay, finally I got it. I wasn't aware that dw_mmc-exynos keeps device >> runtime active all the time between the driver probe() and remove(). >> Right, this will fix this specific case, but it isn't a generic solution, >> so I will also add a comment on that, so one would not need to debug it >> again if he decides to change runtime pm usage scheme in dw_mmc-exynos >> in the future. > Seems reasonable! > > If you want the more generic solution, I would add a exynos specific > suspend_noirq() callback, let it call pm_runtime_get_noresume() and > them pm_runtime_force_suspend(). > > In the corresponding resume_noirq() callback, extend my suggested > changes, with a call to pm_runtime_put_noidle() after all actions has > been done in it. Okay, this finally looks like a proper and future-proof solution. Just one more question: why pm_runtime_put_noidle()? Hypothetically, when the dw_mmc-exynos driver gets proper runtime PM management and system suspend will happen when the device is in runtime suspended state, this will leave it in runtime active state with refcount = 0 after the system resume cycle. Imho simple pm_runtime_put() will be better in this case. Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html