Re: sdhci timeout on imx8mq

Lucas Stach <l.stach@xxxxxxxxxxxxxx> · Thu, 07 Jan 2021 12:26:48 +0100

Hi Jacky,

Am Donnerstag, dem 07.01.2021 um 01:30 +0000 schrieb Jacky Bai:
> > -----Original Message-----
> > From: Fabio Estevam [mailto:festevam@xxxxxxxxx]
> > Sent: Thursday, January 7, 2021 2:57 AM
> > To: Lucas Stach <l.stach@xxxxxxxxxxxxxx>
> > Cc: Bough Chen <haibo.chen@xxxxxxx>; Angus Ainslie
> > <angus@xxxxxxxx>;
> > Leonard Crestez <leonard.crestez@xxxxxxx>; Peng Fan
> > <peng.fan@xxxxxxx>; Abel Vesa <abel.vesa@xxxxxxx>; Stephen Boyd
> > <sboyd@xxxxxxxxxx>; Michael Turquette <mturquette@xxxxxxxxxxxx>;
> > Ulf
> > Hansson <ulf.hansson@xxxxxxxxxx>; Guido Günther <agx@xxxxxxxxxxx>;
> > linux-mmc <linux-mmc@xxxxxxxxxxxxxxx>; Adrian Hunter
> > <adrian.hunter@xxxxxxxxx>; dl-linux-imx <linux-imx@xxxxxxx>; Sascha
> > Hauer <kernel@xxxxxxxxxxxxxx>; moderated list:ARM/FREESCALE IMX /
> > MXC
> > ARM ARCHITECTURE <linux-arm-kernel@xxxxxxxxxxxxxxxxxxx>
> > Subject: Re: sdhci timeout on imx8mq
> > 
> > Hi Lucas,
> > 
> > On Tue, Jan 5, 2021 at 12:06 PM Lucas Stach
> > <l.stach@xxxxxxxxxxxxxx>
> > wrote:
> > 
> > > The reference manual states about this situation: "For any clock,
> > > its
> > > source must be left on when it is kept on. Behavior is undefined
> > > if
> > > this rule is violated."
> > > And it seems this is exactly what's happening here: some kind of
> > > glitch is introduced in the nand_usdhc_bus clock, which prevents
> > > the
> > > SDHCI controller from working, even though the clock branch is
> > > properly enabled later on. On my system the SDHCI timeout and
> > > following runtime suspend/resume cycle on the nand_usdhc_bus
> > > clock
> > > seem to get it back into a working state.
> > 
> > I think your analysis is correct and I recall helping a customer
> > with a similar
> > issue:
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcomm
> > unity.nxp.com%2Ft5%2Fi-MX-Processors%2FExternal-clock-that-provide-
> > root
> > -clock-for-SAI3-and-SPDIF%2Fm-p%2F1019834&amp;data=04%7C01%7Cping
> > .bai%40nxp.com%7C8d250a158cce469c378308d8b274d6d1%7C686ea1d3bc
> > 2b4c6fa92cd99c5c301635%7C0%7C0%7C637455562183497049%7CUnknow
> > n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> > WwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=VkxuhmhDifzOxxfIgFz9PR5gKC1
> > SyQhGeSHYysX1Co4%3D&amp;reserved=0
> > 
> 
> For the customer case, it seem not the same issue. the customer issue
> is caused by clock source change while parent has no clock output.
> This is inherit limitation for the CCM clock slice when using the
> smart interface to change the clock parent.
> 
> For current mmc timeout issue, I think we can have a try with
> nand_usdhc_bus clock gated at the beginning of kernel boot, directly
> modify the nand_usdhc_bus
> Clock's HW register gate bit in clock-imx8mq.c.

While this might be an option to fix this specific issue, I would hope
we can come up with something more generic, as the current clock
framework behavior allows to violate the system specification
constraint that parent clocks must not be disabled when any of the
children are active. This seems like a fundamental issue and might hurt
us also with other clocks than this specific nand_usdhc_bus clock.

Can you tell us if there were other issues found with the PLL1/2 gating
patch? The fact that, according to Bough, it's reverted in your tree
seems to suggest this.

Regards,
Lucas