Re: sdhci-omap: additional PM issue since 5.16

David <daowens01@xxxxxxxxx> · Fri, 24 Jan 2025 12:49:36 -0600

On 1/24/25 11:15, David Owens wrote:
> Hi Romain
>
> On 1/24/25 04:36, Romain Naour wrote:
>> Hello David,
>>
>> Le 23/01/2025 à 23:09, David Owens a écrit :
>>> Hello,
>>>
>>> I have a AM574x system and encountered an eMMC regression when upgrading from 5.15 to 6.1.38.  The eMMC is using mmc-hs200 powered at 1.8v.  Reads from /dev/mmcblk1boot0 will return expected data except when a delay of several seconds is inserted between reads.  With a delay between reads, the read will occasionally (~50% of the time) return garbage data.  Using hexdump, I was able to determine that the "bad" data is actually coming from /dev/mmcblk1, not /dev/mmcblk1boot0.  The same thing happens when reading from /dev/mmcblk1boot1.
>>>
>>> Much like a previous report in the linux-omap mailing list [1], I too was able to correct the regression by reverting the commit "mmc: sdhci-omap: Allow SDIO card power off and enable aggressive PM" [2].  Unlike the previous report, applying the sdhci-omap patch [3] did not resolve my issue.  Only reverting the original commit allowed for reliable reads from /dev/mmcblk1boot0.  I also don't see the same I/O errors mentioned in the previous posting.  Reads always succeed and return the correct amount of data, its just from the wrong device.
>> Interesting, can you share a test script to reproduce your issue?
> Here is a test script I've been running on my devices.  A failure is typically
> detected after a minute or two.  I include the eMMC part type in the output as
> we've used a couple different parts in production, all claiming to be compatible
> and I'm starting to wonder if the failure is a combination of the aggressive
> PM _and_ specific emmc parts.  The offset used in hexdump was just a place in
> both mmcblk1 and mmcblk1boot0 that was non-zero.  The issue happens using any
> offset.
>
> #!/bin/bash
>
> echo "Kernel:    $(uname -r)"
> echo "eMMC part: $(dmesg | grep 'mmcblk1: mmc1:0001' | awk '{print $5}')"
> BLK1=$(hexdump -C /dev/mmcblk1 -s 0x3fc000 -n 10 | head -n 1)
> BOOT=$(hexdump -C /dev/mmcblk1boot0 -s 0x3fc000 -n 10 | head -n 1)
>
> echo "/dev/mmcblk1:      ${BLK1}"
> echo "/dev/mmcblk1boot0: ${BOOT}"
>
> while [[ "$BLK1" != "$BOOT" ]]; do
>     sleep 20
>     BOOT=$(hexdump -C /dev/mmcblk1boot0 -s 0x3fc000 -n 10 | head -n 1)
>     echo "/dev/mmcblk1boot0: ${BOOT}"
> done
>
> echo "/dev/mmcblk1boot0 read failure"
>
>> Why 6.1.38? nowadays the 6.1.x stable is 6.1.127 already.
>> Can you test with the latest stable release?
> Good question.  I can certainly update to .127 but at the time we were shipping
> units we were on .38 so that's where I've been doing all my testing.  I'll let
> you know how running under .127 compares.

Testing with 6.1.127 shows the same behavior.

>> I believe this issue could be reproduced on the beaglebone-ai board (I don't
>> have it).
>>
>> [1] https://www.beagleboard.org/boards/beaglebone-ai
> Thanks for the suggestion, I'll see if I can dig one up.
>
>> Best regards,
>> Romain
>>
>>
>>> [1] https://lore.kernel.org/all/2e5f1997-564c-44e4-b357-6343e0dae7ab@xxxxxxxx/
>>>
>>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3edf588e7fe00e90d1dc7fb9e599861b2c2cf442
>>>
>>> [3] https://lore.kernel.org/linux-omap/20240315234444.816978-1-romain.naour@xxxxxxxx/T/#u
>>>
>>> Regards,
>>>
>>> Dave
>>>
Thanks,

Dave