Re: sdhci-omap: additional PM issue since 5.16

Robert Nelson <robertcnelson@xxxxxxxxx> · Wed, 26 Feb 2025 09:36:40 -0600

On Mon, Jan 27, 2025 at 3:20 PM Robert Nelson <robertcnelson@xxxxxxxxx> wrote:
>
> > Thanks for testing.
> >
> > I'm able to reproduce the issue locally (using a kernel 6.1.112).
> > It fail after the first sleep 20...
> >
> > If I remove MMC_CAP_AGGRESSIVE_PM from the sdhci-omap driver the issue is gone.
> >
> > About sdhci-omap driver, It's one of the only few enabling
> > MMC_CAP_AGGRESSIVE_PM. I recently switched to a new project using a newer SoC
> > but the eMMC driver doesn't event set MMC_CAP_AGGRESSIVE_PM.
> >
> > I'm wondering if MMC_CAP_AGGRESSIVE_PM is really safe (or compatible) for
> > HS200/HS400 eMMC speed. Indeed, MMC_CAP_AGGRESSIVE_PM has been added to
> > sdhci-omap driver to support SDIO WLAN device PM [1].
> >
> > I've found another similar report on the Beaglebone-black (AM335x SoC) [2].
> >
> > It seems the MMC_CAP_AGGRESSIVE_PM feature should only be enabled to SDIO cards.
>
> We've been chasing this Bug in BeagleLand for a while. Had Kingston
> run it thru their hardware debuggers.. On the BBB, once the eMMC is
> suspended during idle, the proper 'wakeup' cmd is NOT sent over,
> instead it forces a full reset. Eventually this kills the eMMC. Been
> playing with this same revert for a day or so, with my personal setup,
> it takes 3-4 Weeks (at idle every day) for it to finally die.. So i
> won't be able to verify this 'really' fixes it till next month..

Okay, it survived 4 weeks.. We really need to revert:
3edf588e7fe00e90d1dc7fb9e599861b2c2cf442

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3edf588e7fe00e90d1dc7fb9e599861b2c2cf442

On every stable kernel back to v6.1.x, this commit is `killing`
Kingston eMMC's on BeagleBone Black's in under 21 days.

By reverting the commit, I finally have a board that's survived the 3
week timeline, (and a week more) with no issues.

Normally on MK2704 EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B changes to 0x02
and the eMMC never works again..

[44-am335x-bbb: 6.1.83-ti-r37 (up 4 weeks, 18 hours, 35 minutes)]

*************************************************
cat /sys/kernel/debug/mmc1/ios
clock: 52000000 Hz
vdd: 21 (3.3 ~ 3.4 V)
bus mode: 2 (push-pull)
chip select: 0 (don't care)
power mode: 2 (on)
bus width: 3 (8 bits)
timing spec: 1 (mmc high-speed)
signal voltage: 0 (3.30 V)
driver type: 0 (driver type B)
*************************************************
dmesg | grep boot0
[    5.362457] mmcblk1boot0: mmc1:0001 MK2704 2.00 MiB
*************************************************
eMMC Firmware Version:
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x01
eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01
*************************************************
cat /tmp/eMMC.txt
eMMC name: MK2704
eMMC date: 04/2023
eMMC rev: 0x7
eMMC hwrev: 0x0
eMMC fwrev: 0x0100000000000000
eMMC oemid: 0x0100
eMMC manfid: 0x000070
eMMC life_time: 0x01 0x01
eMMC serial: 0x5992401d
*************************************************
0x01
0x01 0x01
*************************************************
cat /boot/uEnv.txt
uname_r=6.1.83-ti-r37

I'm sure someone will argue that we should test this "PM" mode... Well
on AM335x this is broken, at $~60 a pop I'm tired of testing this
regression for the last year and half.. waiting 3 weeks for a board to
die..

This needs to be reverted!

Regards,

-- 
Robert Nelson
https://rcn-ee.com/