> -----Original Message----- > From: Lucas Stach [mailto:l.stach@xxxxxxxxxxxxxx] > Sent: 2021年1月5日 23:07 > To: Bough Chen <haibo.chen@xxxxxxx>; Fabio Estevam > <festevam@xxxxxxxxx>; Angus Ainslie <angus@xxxxxxxx>; Leonard Crestez > <leonard.crestez@xxxxxxx>; Peng Fan <peng.fan@xxxxxxx>; Abel Vesa > <abel.vesa@xxxxxxx>; Stephen Boyd <sboyd@xxxxxxxxxx>; Michael Turquette > <mturquette@xxxxxxxxxxxx> > Cc: Ulf Hansson <ulf.hansson@xxxxxxxxxx>; Guido Günther <agx@xxxxxxxxxxx>; > linux-mmc <linux-mmc@xxxxxxxxxxxxxxx>; Adrian Hunter > <adrian.hunter@xxxxxxxxx>; dl-linux-imx <linux-imx@xxxxxxx>; Sascha Hauer > <kernel@xxxxxxxxxxxxxx>; moderated list:ARM/FREESCALE IMX / MXC ARM > ARCHITECTURE <linux-arm-kernel@xxxxxxxxxxxxxxxxxxx> > Subject: Re: sdhci timeout on imx8mq > > Hi all, > > Am Mittwoch, dem 08.07.2020 um 01:32 +0000 schrieb BOUGH CHEN: > > > -----Original Message----- > > > From: Fabio Estevam [mailto:festevam@xxxxxxxxx] > > > Sent: 2020年7月7日 20:45 > > > To: Angus Ainslie <angus@xxxxxxxx> > > > Cc: BOUGH CHEN <haibo.chen@xxxxxxx>; Ulf Hansson > > > <ulf.hansson@xxxxxxxxxx>; Guido Günther <agx@xxxxxxxxxxx>; linux- > > > mmc <linux-mmc@xxxxxxxxxxxxxxx>; Adrian Hunter > > > <adrian.hunter@xxxxxxxxx>; dl-linux-imx <linux-imx@xxxxxxx>; Sascha > > > Hauer < kernel@xxxxxxxxxxxxxx>; moderated list:ARM/FREESCALE IMX / > > > MXC ARM ARCHITECTURE <linux-arm-kernel@xxxxxxxxxxxxxxxxxxx> > > > Subject: Re: sdhci timeout on imx8mq > > > > > > Hi Angus, > > > > > > On Tue, Jun 30, 2020 at 4:39 PM Angus Ainslie <angus@xxxxxxxx> > > > wrote: > > > > > > > Has there been any progress with this. I'm getting this on about > > > > 50% of > > > > > > Not from my side, sorry. > > > > > > Bough, > > > > > > Do you know why this problem affects the imx8mq-evk versions that > > > are populated with the Micron eMMC and not the ones with Sandisk > > > eMMC? > > > > Hi Angus, > > > > Can you show me the full fail log? I do not meet this issue on my > > side, besides, which kind of uboot do you use? > > I was finally able to bisect this issue, which wasn't that much fun due to the > issue not being reproducible 100%. :/ Turns out that the issue is even more > interesting than I thought and likely doesn't have anything to do with SDHCI or > used bootloader versions. Here's my current debugging state: > > I've bisected the issue down to b04383b6a558 (clk: imx8mq: Define gates for > pll1/2 fixed dividers). The change itself looks fine to me, still CC'ed Leonard for > good measure. > > In my testing the following partial revert fixes the issue: > > --- a/drivers/clk/imx/clk-imx8mq.c > +++ b/drivers/clk/imx/clk-imx8mq.c > @@ -365,7 +365,7 @@ static int imx8mq_clocks_probe(struct > platform_device *pdev) > hws[IMX8MQ_SYS1_PLL_133M_CG] = > imx_clk_hw_gate("sys1_pll_133m_cg", "sys1_pll_out", base + 0x30, 15); > hws[IMX8MQ_SYS1_PLL_160M_CG] = > imx_clk_hw_gate("sys1_pll_160m_cg", "sys1_pll_out", base + 0x30, 17); > hws[IMX8MQ_SYS1_PLL_200M_CG] = > imx_clk_hw_gate("sys1_pll_200m_cg", "sys1_pll_out", base + 0x30, 19); > - hws[IMX8MQ_SYS1_PLL_266M_CG] = > imx_clk_hw_gate("sys1_pll_266m_cg", "sys1_pll_out", base + 0x30, 21); > hws[IMX8MQ_SYS1_PLL_400M_CG] = > imx_clk_hw_gate("sys1_pll_400m_cg", "sys1_pll_out", base + 0x30, 23); > hws[IMX8MQ_SYS1_PLL_800M_CG] = > imx_clk_hw_gate("sys1_pll_800m_cg", "sys1_pll_out", base + 0x30, 25); > > @@ -375,7 +375,7 @@ static int imx8mq_clocks_probe(struct > platform_device *pdev) > hws[IMX8MQ_SYS1_PLL_133M] = > imx_clk_hw_fixed_factor("sys1_pll_133m", "sys1_pll_133m_cg", 1, 6); > hws[IMX8MQ_SYS1_PLL_160M] = > imx_clk_hw_fixed_factor("sys1_pll_160m", "sys1_pll_160m_cg", 1, 5); > hws[IMX8MQ_SYS1_PLL_200M] = > imx_clk_hw_fixed_factor("sys1_pll_200m", "sys1_pll_200m_cg", 1, 4); > - hws[IMX8MQ_SYS1_PLL_266M] = > imx_clk_hw_fixed_factor("sys1_pll_266m", "sys1_pll_266m_cg", 1, 3); > + hws[IMX8MQ_SYS1_PLL_266M] = > + imx_clk_hw_fixed_factor("sys1_pll_266m", "sys1_pll_out", 1, 3); > hws[IMX8MQ_SYS1_PLL_400M] = > imx_clk_hw_fixed_factor("sys1_pll_400m", "sys1_pll_400m_cg", 1, 2); > hws[IMX8MQ_SYS1_PLL_800M] = > imx_clk_hw_fixed_factor("sys1_pll_800m", "sys1_pll_800m_cg", 1, 1); > > The sys1_pll_266m is the parent of nand_usdhc_bus. I've validated that the > SDHCI driver properly enables this bus clock across the problematic card access. > So what I think is happening here is that both nand_usdhc_bus and > sys1_pll_266m are initially enabled. Sometime during boot sys1_pll_266m gets > disabled due to runtime PM on the enet_axi clock, which is a direct child of > sys1_pll_266m. At this point nand_usdhc_bus is still enabled, but no consumer > has claimed the clock yet, so the parent clock gets disabled while this branch of > the clock tree is still active. Hi Lucas, According to the clock tree, if nand_usdhc_bus is still enabled, then sys1_pll_266m has no chance to disable. sys1_pll_266m_cg 1 1 0 800000000 0 0 50000 Y sys1_pll_266m 1 1 0 266666666 0 0 50000 Y nand_usdhc_bus 0 0 0 266666666 0 0 50000 N nand_usdhc_rawnand_clk 0 0 0 266666666 0 0 50000 N enet_axi 1 1 0 266666666 0 0 50000 Y enet1_root_clk 2 2 0 266666666 0 0 50000 Y This issue seems related with the following errta: e11232: USDHC: uSDHC setting requirement for IPG_CLK and AHB_BUS clocks Description: uSDHC AHB_BUS and IPG_CLK clocks must be synchronized. Due to current physical design implementation, AHB_BUS and IPG_CLK must come from same clock source to maintain clock sync. Workaround: Set AHB_BUS and IPG_CLK to clock source from PLL1. After sys1_pll_266m gate off/on, seems need to sync the USDHC AHB bus and USDHC IPG_clk again. (Here usdhc AHB BUS source from nand_usdhc_bus.) This sync is handle by hardware, and maybe need some time, during this sync period, usdhc operation may has issue. I just double check our local v5.10 branch, already revert the commit b04383b6a558 (clk: imx8mq: Define gates for pll1/2 fixed dividers). So to fix this issue, one method is revert this patch, another method is keep the 'nand_usdhc_bus' always on. Add change like this: diff --git a/drivers/clk/imx/clk-imx8mq.c b/drivers/clk/imx/clk-imx8mq.c index 779ea69e639c..939806b36916 100644 --- a/drivers/clk/imx/clk-imx8mq.c +++ b/drivers/clk/imx/clk-imx8mq.c @@ -433,7 +433,7 @@ static int imx8mq_clocks_probe(struct platform_device *pdev) /* BUS */ hws[IMX8MQ_CLK_MAIN_AXI] = imx8m_clk_hw_composite_bus_critical("main_axi", imx8mq_main_axi_sels, base + 0x8800); hws[IMX8MQ_CLK_ENET_AXI] = imx8m_clk_hw_composite_bus("enet_axi", imx8mq_enet_axi_sels, base + 0x8880); - hws[IMX8MQ_CLK_NAND_USDHC_BUS] = imx8m_clk_hw_composite_bus("nand_usdhc_bus", imx8mq_nand_usdhc_sels, base + 0x8900); + hws[IMX8MQ_CLK_NAND_USDHC_BUS] = imx8m_clk_hw_composite_bus_critical("nand_usdhc_bus", imx8mq_nand_usdhc_sels, base + 0x8900); hws[IMX8MQ_CLK_VPU_BUS] = imx8m_clk_hw_composite_bus("vpu_bus", imx8mq_vpu_bus_sels, base + 0x8980); hws[IMX8MQ_CLK_DISP_AXI] = imx8m_clk_hw_composite_bus("disp_axi", imx8mq_disp_axi_sels, base + 0x8a00); hws[IMX8MQ_CLK_DISP_APB] = imx8m_clk_hw_composite_bus("disp_apb", imx8mq_disp_apb_sels, base + 0x8a80); What you think? Or any other suggestion? > > The reference manual states about this situation: "For any clock, its source > must be left on when it is kept on. Behavior is undefined if this rule is violated." > And it seems this is exactly what's happening here: some kind of glitch is > introduced in the nand_usdhc_bus clock, which prevents the SDHCI controller > from working, even though the clock branch is properly enabled later on. On my > system the SDHCI timeout and following runtime suspend/resume cycle on the > nand_usdhc_bus clock seem to get it back into a working state. > > So I think we need some solution at the clock driver/framework level to prevent > shutting down parent clocks that have active branches, even if those branches > aren't claimed by a consumer (yet). > > Regards, > Lucas