Re: sdhci-omap: additional PM issue since 5.16

Ulf Hansson <ulf.hansson@xxxxxxxxxx> · Wed, 12 Mar 2025 12:55:52 +0100

On Fri, 7 Mar 2025 at 05:28, Tony Lindgren <tony@xxxxxxxxxxx> wrote:
>
> * Andreas Kemnade <andreas@xxxxxxxxxxxx> [250226 16:06]:
> > Am Wed, 26 Feb 2025 09:36:40 -0600
> > schrieb Robert Nelson <robertcnelson@xxxxxxxxx>:
> >
> > > On Mon, Jan 27, 2025 at 3:20 PM Robert Nelson <robertcnelson@xxxxxxxxx> wrote:
> > > >
> > > > > Thanks for testing.
> > > > >
> > > > > I'm able to reproduce the issue locally (using a kernel 6.1.112).
> > > > > It fail after the first sleep 20...
> > > > >
> > > > > If I remove MMC_CAP_AGGRESSIVE_PM from the sdhci-omap driver the issue is gone.
> > > > >
> > > > > About sdhci-omap driver, It's one of the only few enabling
> > > > > MMC_CAP_AGGRESSIVE_PM. I recently switched to a new project using a newer SoC
> > > > > but the eMMC driver doesn't event set MMC_CAP_AGGRESSIVE_PM.
> > > > >
> > > > > I'm wondering if MMC_CAP_AGGRESSIVE_PM is really safe (or compatible) for
> > > > > HS200/HS400 eMMC speed. Indeed, MMC_CAP_AGGRESSIVE_PM has been added to
> > > > > sdhci-omap driver to support SDIO WLAN device PM [1].
> > > > >
> > > > > I've found another similar report on the Beaglebone-black (AM335x SoC) [2].
> > > > >
> > > > > It seems the MMC_CAP_AGGRESSIVE_PM feature should only be enabled to SDIO cards.
> > > >
> > > > We've been chasing this Bug in BeagleLand for a while. Had Kingston
> > > > run it thru their hardware debuggers.. On the BBB, once the eMMC is
> > > > suspended during idle, the proper 'wakeup' cmd is NOT sent over,
> > > > instead it forces a full reset. Eventually this kills the eMMC. Been
> > > > playing with this same revert for a day or so, with my personal setup,
> > > > it takes 3-4 Weeks (at idle every day) for it to finally die.. So i
> > > > won't be able to verify this 'really' fixes it till next month..
> > >
> > > Okay, it survived 4 weeks.. We really need to revert:
> > > 3edf588e7fe00e90d1dc7fb9e599861b2c2cf442
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3edf588e7fe00e90d1dc7fb9e599861b2c2cf442
> > >
> > > On every stable kernel back to v6.1.x, this commit is `killing`
> > > Kingston eMMC's on BeagleBone Black's in under 21 days.
> > >
> > > By reverting the commit, I finally have a board that's survived the 3
> > > week timeline, (and a week more) with no issues.
> > >
> > Is there any simple way to restrain it to only sdio devices to go
> > forward a bit?
>
> Best to revert the patch first until the issue has been fixed.
>
> Based on the symptoms, it sounds like there might be a missing flush of
> a posted write in the PM runtime suspend/resume path. This could cause
> something in the sequence happen in the wrong order for some of the
> related surrounding resources like power, clocks or interrupts.

SDIO is entirely different in this regard compared to eMMC/SD. So if
there are no reports of issues I suggest we keep the SDIO part.

Let me help out and cook a patch for this. I send it out in a few minutes.

>
> Regards,
>
> Tony

Kind regards
Uffe