RE: sdhci timeout on imx8mq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Lucas Stach [mailto:l.stach@xxxxxxxxxxxxxx]
> Sent: 2021年1月6日 23:10
> To: Bough Chen <haibo.chen@xxxxxxx>; Fabio Estevam
> <festevam@xxxxxxxxx>; Angus Ainslie <angus@xxxxxxxx>; Peng Fan
> <peng.fan@xxxxxxx>; Abel Vesa <abel.vesa@xxxxxxx>; Stephen Boyd
> <sboyd@xxxxxxxxxx>; Michael Turquette <mturquette@xxxxxxxxxxxx>
> Cc: Ulf Hansson <ulf.hansson@xxxxxxxxxx>; Guido Günther <agx@xxxxxxxxxxx>;
> linux-mmc <linux-mmc@xxxxxxxxxxxxxxx>; Adrian Hunter
> <adrian.hunter@xxxxxxxxx>; dl-linux-imx <linux-imx@xxxxxxx>; Sascha Hauer
> <kernel@xxxxxxxxxxxxxx>; moderated list:ARM/FREESCALE IMX / MXC ARM
> ARCHITECTURE <linux-arm-kernel@xxxxxxxxxxxxxxxxxxx>
> Subject: Re: sdhci timeout on imx8mq
> 
> Hi Bough,
> 
> Am Mittwoch, dem 06.01.2021 um 09:29 +0000 schrieb Bough Chen:
> > > -----Original Message-----
> > > From: Lucas Stach [mailto:l.stach@xxxxxxxxxxxxxx]
> > > Sent: 2021年1月5日 23:07
> > > To: Bough Chen <haibo.chen@xxxxxxx>; Fabio Estevam
> > > <festevam@xxxxxxxxx>; Angus Ainslie <angus@xxxxxxxx>; Leonard
> > > Crestez <leonard.crestez@xxxxxxx>; Peng Fan <peng.fan@xxxxxxx>; Abel
> > > Vesa <abel.vesa@xxxxxxx>; Stephen Boyd <sboyd@xxxxxxxxxx>; Michael
> > > Turquette <mturquette@xxxxxxxxxxxx>
> > > Cc: Ulf Hansson <ulf.hansson@xxxxxxxxxx>; Guido Günther <
> > > agx@xxxxxxxxxxx>; linux-mmc <linux-mmc@xxxxxxxxxxxxxxx>; Adrian
> > > Hunter <adrian.hunter@xxxxxxxxx>; dl-linux-imx <linux-imx@xxxxxxx>;
> > > Sascha Hauer <kernel@xxxxxxxxxxxxxx>; moderated list:ARM/FREESCALE
> > > IMX / MXC ARM ARCHITECTURE <linux-arm-kernel@xxxxxxxxxxxxxxxxxxx>
> > > Subject: Re: sdhci timeout on imx8mq
> > >
> > > Hi all,
> > >
> > > Am Mittwoch, dem 08.07.2020 um 01:32 +0000 schrieb BOUGH CHEN:
> > > > > -----Original Message-----
> > > > > From: Fabio Estevam [mailto:festevam@xxxxxxxxx]
> > > > > Sent: 2020年7月7日 20:45
> > > > > To: Angus Ainslie <angus@xxxxxxxx>
> > > > > Cc: BOUGH CHEN <haibo.chen@xxxxxxx>; Ulf Hansson
> > > > > <ulf.hansson@xxxxxxxxxx>; Guido Günther <agx@xxxxxxxxxxx>;
> > > > > linux-
> > > > > mmc <linux-mmc@xxxxxxxxxxxxxxx>; Adrian Hunter
> > > > > <adrian.hunter@xxxxxxxxx>; dl-linux-imx <linux-imx@xxxxxxx>;
> > > > > Sascha Hauer < kernel@xxxxxxxxxxxxxx>; moderated
> > > > > list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE
> > > > > <linux-arm-kernel@xxxxxxxxxxxxxxxxxxx>
> > > > > Subject: Re: sdhci timeout on imx8mq
> > > > >
> > > > > Hi Angus,
> > > > >
> > > > > On Tue, Jun 30, 2020 at 4:39 PM Angus Ainslie <angus@xxxxxxxx>
> > > > > wrote:
> > > > >
> > > > > > Has there been any progress with this. I'm getting this on
> > > > > > about 50% of
> > > > >
> > > > > Not from my side, sorry.
> > > > >
> > > > > Bough,
> > > > >
> > > > > Do you know why this problem affects the imx8mq-evk versions
> > > > > that are populated with the Micron eMMC and not the ones with
> > > > > Sandisk eMMC?
> > > >
> > > > Hi Angus,
> > > >
> > > > Can you show me the full fail log? I do not meet this issue on my
> > > > side, besides, which kind of uboot do you use?
> > >
> > > I was finally able to bisect this issue, which wasn't that much fun
> > > due to the issue not being reproducible 100%. :/ Turns out that the
> > > issue is even more interesting than I thought and likely doesn't
> > > have anything to do with SDHCI or used bootloader versions. Here's
> > > my current debugging state:
> > >
> > > I've bisected the issue down to b04383b6a558 (clk: imx8mq: Define
> > > gates for
> > > pll1/2 fixed dividers). The change itself looks fine to me, still
> > > CC'ed Leonard for good measure.
> > >
> > > In my testing the following partial revert fixes the issue:
> > >
> > > --- a/drivers/clk/imx/clk-imx8mq.c
> > > +++ b/drivers/clk/imx/clk-imx8mq.c
> > > @@ -365,7 +365,7 @@ static int imx8mq_clocks_probe(struct
> > > platform_device *pdev)
> > >         hws[IMX8MQ_SYS1_PLL_133M_CG] =
> > > imx_clk_hw_gate("sys1_pll_133m_cg", "sys1_pll_out", base + 0x30,
> > > 15);
> > >         hws[IMX8MQ_SYS1_PLL_160M_CG] =
> > > imx_clk_hw_gate("sys1_pll_160m_cg", "sys1_pll_out", base + 0x30,
> > > 17);
> > >         hws[IMX8MQ_SYS1_PLL_200M_CG] =
> > > imx_clk_hw_gate("sys1_pll_200m_cg", "sys1_pll_out", base + 0x30, 19);
> > > -       hws[IMX8MQ_SYS1_PLL_266M_CG] =
> > > imx_clk_hw_gate("sys1_pll_266m_cg", "sys1_pll_out", base + 0x30,
> > > 21);
> > >         hws[IMX8MQ_SYS1_PLL_400M_CG] =
> > > imx_clk_hw_gate("sys1_pll_400m_cg", "sys1_pll_out", base + 0x30,
> > > 23);
> > >         hws[IMX8MQ_SYS1_PLL_800M_CG] =
> > > imx_clk_hw_gate("sys1_pll_800m_cg", "sys1_pll_out", base + 0x30,
> > > 25);
> > >
> > > @@ -375,7 +375,7 @@ static int imx8mq_clocks_probe(struct
> > > platform_device *pdev)
> > >         hws[IMX8MQ_SYS1_PLL_133M] =
> > > imx_clk_hw_fixed_factor("sys1_pll_133m", "sys1_pll_133m_cg", 1, 6);
> > >         hws[IMX8MQ_SYS1_PLL_160M] =
> > > imx_clk_hw_fixed_factor("sys1_pll_160m", "sys1_pll_160m_cg", 1, 5);
> > >         hws[IMX8MQ_SYS1_PLL_200M] =
> > > imx_clk_hw_fixed_factor("sys1_pll_200m", "sys1_pll_200m_cg", 1, 4);
> > > -       hws[IMX8MQ_SYS1_PLL_266M] =
> > > imx_clk_hw_fixed_factor("sys1_pll_266m", "sys1_pll_266m_cg", 1, 3);
> > > +       hws[IMX8MQ_SYS1_PLL_266M] =
> > > + imx_clk_hw_fixed_factor("sys1_pll_266m", "sys1_pll_out", 1, 3);
> > >         hws[IMX8MQ_SYS1_PLL_400M] =
> > > imx_clk_hw_fixed_factor("sys1_pll_400m", "sys1_pll_400m_cg", 1, 2);
> > >         hws[IMX8MQ_SYS1_PLL_800M] =
> > > imx_clk_hw_fixed_factor("sys1_pll_800m", "sys1_pll_800m_cg", 1, 1);
> > >
> > > The sys1_pll_266m is the parent of nand_usdhc_bus. I've validated
> > > that the SDHCI driver properly enables this bus clock across the
> > > problematic card access.
> > > So what I think is happening here is that both nand_usdhc_bus and
> > > sys1_pll_266m are initially enabled. Sometime during boot
> > > sys1_pll_266m gets disabled due to runtime PM on the enet_axi clock,
> > > which is a direct child of sys1_pll_266m. At this point
> > > nand_usdhc_bus is still enabled, but no consumer has claimed the
> > > clock yet, so the parent clock gets disabled while this branch of
> > > the clock tree is still active.
> >
> > Hi Lucas,
> >
> > According to the clock tree, if nand_usdhc_bus is still enabled, then
> > sys1_pll_266m has no chance to disable.
> 
> This statement is only correct after the SDHCI driver is probed an has enabled
> nand_usdhc_bus. Before the driver probes the refcounts on the clocks are not
> synchronized, so sys1_pll_266m_cg can be disabled, while nand_usdhc_bus is
> enabled (from software running before Linux), even though no consumer is
> using nand_usdhc_bus, yet.

Yes, agree. For current case, uboot gate on the sys1_pll_266m, then boot the Linux.
In Linux, after clock driver probe, due the the support of sys1_pll_266m_cg, this sys1_pll_266m is gate off by clock driver due to no default consumer.

> 
> >     sys1_pll_266m_cg                  1        1        0
> 800000000          0     0  50000         Y
> >        sys1_pll_266m                  1        1        0
> 266666666          0     0  50000         Y
> >           nand_usdhc_bus              0        0        0
> 266666666          0     0  50000         N
> >              nand_usdhc_rawnand_clk       0        0        0
> 266666666          0     0  50000         N
> >           enet_axi                    1        1        0
> 266666666          0     0  50000         Y
> >              enet1_root_clk           2        2        0
> 266666666          0     0  50000         Y
> >
> >
> > This issue seems related with the following errta:
> >
> > e11232: USDHC: uSDHC setting requirement for IPG_CLK and AHB_BUS
> > clocks
> > Description: uSDHC AHB_BUS and IPG_CLK clocks must be synchronized.
> > Due to current physical design implementation, AHB_BUS and IPG_CLK
> > must come from same clock source to maintain clock sync.
> > Workaround: Set AHB_BUS and IPG_CLK to clock source from PLL1.
> >
> > After sys1_pll_266m gate off/on, seems need to sync the USDHC AHB bus
> > and USDHC IPG_clk again. (Here usdhc AHB BUS source from
> > nand_usdhc_bus.)
> > This sync is handle by hardware, and maybe need some time, during this
> > sync period, usdhc operation may has issue.
> 
> Where in HW is this synchronization done? If it's at the uSDHC controller side, I
> would expect this issue to show up even with the commit reverted, as
> nand_usdhc_bus gets gated due to runtime PM from the controller side. The
> only difference with the commit in question is that now the clock branch can be
> gated _before_ nand_usdhc_bus. If the synchronization is done somewhere in
> the clock tree than this might be an issue.
> 

Not in uSDHC side. This synchronization should be done somewhere in clock tree(hardware side). 

> >
> > I just double check our local v5.10 branch, already revert the commit
> > b04383b6a558 (clk: imx8mq: Define gates for pll1/2 fixed dividers).
> > So to fix this issue, one method is revert this patch, another method
> > is keep the 'nand_usdhc_bus' always on. Add change like this:
> >
> > diff --git a/drivers/clk/imx/clk-imx8mq.c
> > b/drivers/clk/imx/clk-imx8mq.c index 779ea69e639c..939806b36916 100644
> > --- a/drivers/clk/imx/clk-imx8mq.c
> > +++ b/drivers/clk/imx/clk-imx8mq.c
> > @@ -433,7 +433,7 @@ static int imx8mq_clocks_probe(struct
> > platform_device *pdev)
> >         /* BUS */
> >         hws[IMX8MQ_CLK_MAIN_AXI] =
> > imx8m_clk_hw_composite_bus_critical("main_axi", imx8mq_main_axi_sels,
> > base + 0x8800);
> >         hws[IMX8MQ_CLK_ENET_AXI] =
> imx8m_clk_hw_composite_bus("enet_axi", imx8mq_enet_axi_sels, base +
> 0x8880);
> > -       hws[IMX8MQ_CLK_NAND_USDHC_BUS] =
> imx8m_clk_hw_composite_bus("nand_usdhc_bus", imx8mq_nand_usdhc_sels,
> base + 0x8900);
> > +       hws[IMX8MQ_CLK_NAND_USDHC_BUS] =
> > + imx8m_clk_hw_composite_bus_critical("nand_usdhc_bus",
> > + imx8mq_nand_usdhc_sels, base + 0x8900);
> >         hws[IMX8MQ_CLK_VPU_BUS] =
> > imx8m_clk_hw_composite_bus("vpu_bus", imx8mq_vpu_bus_sels, base +
> > 0x8980);
> >         hws[IMX8MQ_CLK_DISP_AXI] =
> > imx8m_clk_hw_composite_bus("disp_axi", imx8mq_disp_axi_sels, base +
> > 0x8a00);
> >         hws[IMX8MQ_CLK_DISP_APB] =
> > imx8m_clk_hw_composite_bus("disp_apb", imx8mq_disp_apb_sels, base +
> > 0x8a80);
> >
> >
> > What you think? Or any other suggestion?
> 
> This is suboptimal, as it will not allow to gate the uSDHC controller AHB clock in
> runtime suspend. Also my testing shows that it's the gate _before_ the
> nand_usdhc_bus slice that's causing the issue. So my minimal fix from the
> previous mail would still be better, as it allows to gate the nand_usdhc_bus
> clock, while keeping sys1_pll_266m enabled.

Whether to choose your minimal fix or revert the commit, let's involve clock team member, Abel/Jacky, any comment?
Our local tree just revert this commit, I think there are some other reason, Jacky, could you help clarify that?

Best Regards
Haibo Chen
> 
> Regards,
> Lucas
> >





[Index of Archives]     [Linux Memonry Technology]     [Linux USB Devel]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux