Re: sdhci timeout on imx8mq

Heiko Thiery <heiko.thiery@xxxxxxxxx> · Tue, 9 Mar 2021 08:35:12 +0100

Hi all,

> Hi Jacky,
> 
> Am Donnerstag, dem 07.01.2021 um 01:30 +0000 schrieb Jacky Bai:
> > > -----Original Message-----
> > > From: Fabio Estevam [mailto:festevam@xxxxxxxxx]
> > > Sent: Thursday, January 7, 2021 2:57 AM
> > > To: Lucas Stach <l.stach@xxxxxxxxxxxxxx>
> > > Cc: Bough Chen <haibo.chen@xxxxxxx>; Angus Ainslie
> > > <angus@xxxxxxxx>;
> > > Leonard Crestez <leonard.crestez@xxxxxxx>; Peng Fan
> > > <peng.fan@xxxxxxx>; Abel Vesa <abel.vesa@xxxxxxx>; Stephen Boyd
> > > <sboyd@xxxxxxxxxx>; Michael Turquette <mturquette@xxxxxxxxxxxx>;
> > > Ulf
> > > Hansson <ulf.hansson@xxxxxxxxxx>; Guido Günther <agx@xxxxxxxxxxx>;
> > > linux-mmc <linux-mmc@xxxxxxxxxxxxxxx>; Adrian Hunter
> > > <adrian.hunter@xxxxxxxxx>; dl-linux-imx <linux-imx@xxxxxxx>; Sascha
> > > Hauer <kernel@xxxxxxxxxxxxxx>; moderated list:ARM/FREESCALE IMX /
> > > MXC
> > > ARM ARCHITECTURE <linux-arm-kernel@xxxxxxxxxxxxxxxxxxx>
> > > Subject: Re: sdhci timeout on imx8mq
> > > 
> > > Hi Lucas,
> > > 
> > > On Tue, Jan 5, 2021 at 12:06 PM Lucas Stach
> > > <l.stach@xxxxxxxxxxxxxx>
> > > wrote:
> > > 
> > > > The reference manual states about this situation: "For any clock,
> > > > its
> > > > source must be left on when it is kept on. Behavior is undefined
> > > > if
> > > > this rule is violated."
> > > > And it seems this is exactly what's happening here: some kind of
> > > > glitch is introduced in the nand_usdhc_bus clock, which prevents
> > > > the
> > > > SDHCI controller from working, even though the clock branch is
> > > > properly enabled later on. On my system the SDHCI timeout and
> > > > following runtime suspend/resume cycle on the nand_usdhc_bus
> > > > clock
> > > > seem to get it back into a working state.
> > > 
> > > I think your analysis is correct and I recall helping a customer
> > > with a similar
> > > issue:
> > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcomm
> > > unity.nxp.com%2Ft5%2Fi-MX-Processors%2FExternal-clock-that-provide-
> > > root
> > > -clock-for-SAI3-and-SPDIF%2Fm-p%2F1019834&amp;data=04%7C01%7Cping
> > > .bai%40nxp.com%7C8d250a158cce469c378308d8b274d6d1%7C686ea1d3bc
> > > 2b4c6fa92cd99c5c301635%7C0%7C0%7C637455562183497049%7CUnknow
> > > n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> > > WwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=VkxuhmhDifzOxxfIgFz9PR5gKC1
> > > SyQhGeSHYysX1Co4%3D&amp;reserved=0
> > > 
> > 
> > For the customer case, it seem not the same issue. the customer issue
> > is caused by clock source change while parent has no clock output.
> > This is inherit limitation for the CCM clock slice when using the
> > smart interface to change the clock parent.
> > 
> > For current mmc timeout issue, I think we can have a try with
> > nand_usdhc_bus clock gated at the beginning of kernel boot, directly
> > modify the nand_usdhc_bus
> > Clock's HW register gate bit in clock-imx8mq.c.
> 
> While this might be an option to fix this specific issue, I would hope
> we can come up with something more generic, as the current clock
> framework behavior allows to violate the system specification
> constraint that parent clocks must not be disabled when any of the
> children are active. This seems like a fundamental issue and might hurt
> us also with other clocks than this specific nand_usdhc_bus clock.

I am not sure if an error in the fec driver has the same or similar cause as this. But I noticed that the SOC hangs when accessing the timecounter register while the FEC is down. The CCGR10 seems to gate the ENET_REF_CLK_ROOT and the ENET_TIMER_CLK_ROOT. The clocks are disabled as soon as the interface is down.

[1] https://lore.kernel.org/lkml/20210220065654.25598-1-heiko.thiery@xxxxxxxxx/

> 
> Can you tell us if there were other issues found with the PLL1/2 gating
> patch? The fact that, according to Bough, it's reverted in your tree
> seems to suggest this.
> 
> Regards,
> Lucas

Thank you

-- 
Heiko