Hi all, Am Mittwoch, dem 08.07.2020 um 01:32 +0000 schrieb BOUGH CHEN: > > -----Original Message----- > > From: Fabio Estevam [mailto:festevam@xxxxxxxxx] > > Sent: 2020年7月7日 20:45 > > To: Angus Ainslie <angus@xxxxxxxx> > > Cc: BOUGH CHEN <haibo.chen@xxxxxxx>; Ulf Hansson > > <ulf.hansson@xxxxxxxxxx>; Guido Günther <agx@xxxxxxxxxxx>; linux- > > mmc > > <linux-mmc@xxxxxxxxxxxxxxx>; Adrian Hunter > > <adrian.hunter@xxxxxxxxx>; > > dl-linux-imx <linux-imx@xxxxxxx>; Sascha Hauer < > > kernel@xxxxxxxxxxxxxx>; > > moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE > > <linux-arm-kernel@xxxxxxxxxxxxxxxxxxx> > > Subject: Re: sdhci timeout on imx8mq > > > > Hi Angus, > > > > On Tue, Jun 30, 2020 at 4:39 PM Angus Ainslie <angus@xxxxxxxx> > > wrote: > > > > > Has there been any progress with this. I'm getting this on about > > > 50% > > > of > > > > Not from my side, sorry. > > > > Bough, > > > > Do you know why this problem affects the imx8mq-evk versions that > > are > > populated with the Micron eMMC and not the ones with Sandisk eMMC? > > Hi Angus, > > Can you show me the full fail log? I do not meet this issue on my > side, besides, which kind of uboot do you use? I was finally able to bisect this issue, which wasn't that much fun due to the issue not being reproducible 100%. :/ Turns out that the issue is even more interesting than I thought and likely doesn't have anything to do with SDHCI or used bootloader versions. Here's my current debugging state: I've bisected the issue down to b04383b6a558 (clk: imx8mq: Define gates for pll1/2 fixed dividers). The change itself looks fine to me, still CC'ed Leonard for good measure. In my testing the following partial revert fixes the issue: --- a/drivers/clk/imx/clk-imx8mq.c +++ b/drivers/clk/imx/clk-imx8mq.c @@ -365,7 +365,7 @@ static int imx8mq_clocks_probe(struct platform_device *pdev) hws[IMX8MQ_SYS1_PLL_133M_CG] = imx_clk_hw_gate("sys1_pll_133m_cg", "sys1_pll_out", base + 0x30, 15); hws[IMX8MQ_SYS1_PLL_160M_CG] = imx_clk_hw_gate("sys1_pll_160m_cg", "sys1_pll_out", base + 0x30, 17); hws[IMX8MQ_SYS1_PLL_200M_CG] = imx_clk_hw_gate("sys1_pll_200m_cg", "sys1_pll_out", base + 0x30, 19); - hws[IMX8MQ_SYS1_PLL_266M_CG] = imx_clk_hw_gate("sys1_pll_266m_cg", "sys1_pll_out", base + 0x30, 21); hws[IMX8MQ_SYS1_PLL_400M_CG] = imx_clk_hw_gate("sys1_pll_400m_cg", "sys1_pll_out", base + 0x30, 23); hws[IMX8MQ_SYS1_PLL_800M_CG] = imx_clk_hw_gate("sys1_pll_800m_cg", "sys1_pll_out", base + 0x30, 25); @@ -375,7 +375,7 @@ static int imx8mq_clocks_probe(struct platform_device *pdev) hws[IMX8MQ_SYS1_PLL_133M] = imx_clk_hw_fixed_factor("sys1_pll_133m", "sys1_pll_133m_cg", 1, 6); hws[IMX8MQ_SYS1_PLL_160M] = imx_clk_hw_fixed_factor("sys1_pll_160m", "sys1_pll_160m_cg", 1, 5); hws[IMX8MQ_SYS1_PLL_200M] = imx_clk_hw_fixed_factor("sys1_pll_200m", "sys1_pll_200m_cg", 1, 4); - hws[IMX8MQ_SYS1_PLL_266M] = imx_clk_hw_fixed_factor("sys1_pll_266m", "sys1_pll_266m_cg", 1, 3); + hws[IMX8MQ_SYS1_PLL_266M] = imx_clk_hw_fixed_factor("sys1_pll_266m", "sys1_pll_out", 1, 3); hws[IMX8MQ_SYS1_PLL_400M] = imx_clk_hw_fixed_factor("sys1_pll_400m", "sys1_pll_400m_cg", 1, 2); hws[IMX8MQ_SYS1_PLL_800M] = imx_clk_hw_fixed_factor("sys1_pll_800m", "sys1_pll_800m_cg", 1, 1); The sys1_pll_266m is the parent of nand_usdhc_bus. I've validated that the SDHCI driver properly enables this bus clock across the problematic card access. So what I think is happening here is that both nand_usdhc_bus and sys1_pll_266m are initially enabled. Sometime during boot sys1_pll_266m gets disabled due to runtime PM on the enet_axi clock, which is a direct child of sys1_pll_266m. At this point nand_usdhc_bus is still enabled, but no consumer has claimed the clock yet, so the parent clock gets disabled while this branch of the clock tree is still active. The reference manual states about this situation: "For any clock, its source must be left on when it is kept on. Behavior is undefined if this rule is violated." And it seems this is exactly what's happening here: some kind of glitch is introduced in the nand_usdhc_bus clock, which prevents the SDHCI controller from working, even though the clock branch is properly enabled later on. On my system the SDHCI timeout and following runtime suspend/resume cycle on the nand_usdhc_bus clock seem to get it back into a working state. So I think we need some solution at the clock driver/framework level to prevent shutting down parent clocks that have active branches, even if those branches aren't claimed by a consumer (yet). Regards, Lucas