On 1/24/25 11:15, David Owens wrote: > Hi Romain > > On 1/24/25 04:36, Romain Naour wrote: >> Hello David, >> >> Le 23/01/2025 à 23:09, David Owens a écrit : >>> Hello, >>> >>> I have a AM574x system and encountered an eMMC regression when upgrading from 5.15 to 6.1.38. The eMMC is using mmc-hs200 powered at 1.8v. Reads from /dev/mmcblk1boot0 will return expected data except when a delay of several seconds is inserted between reads. With a delay between reads, the read will occasionally (~50% of the time) return garbage data. Using hexdump, I was able to determine that the "bad" data is actually coming from /dev/mmcblk1, not /dev/mmcblk1boot0. The same thing happens when reading from /dev/mmcblk1boot1. >>> >>> Much like a previous report in the linux-omap mailing list [1], I too was able to correct the regression by reverting the commit "mmc: sdhci-omap: Allow SDIO card power off and enable aggressive PM" [2]. Unlike the previous report, applying the sdhci-omap patch [3] did not resolve my issue. Only reverting the original commit allowed for reliable reads from /dev/mmcblk1boot0. I also don't see the same I/O errors mentioned in the previous posting. Reads always succeed and return the correct amount of data, its just from the wrong device. >> Interesting, can you share a test script to reproduce your issue? > Here is a test script I've been running on my devices. A failure is typically > detected after a minute or two. I include the eMMC part type in the output as > we've used a couple different parts in production, all claiming to be compatible > and I'm starting to wonder if the failure is a combination of the aggressive > PM _and_ specific emmc parts. The offset used in hexdump was just a place in > both mmcblk1 and mmcblk1boot0 that was non-zero. The issue happens using any > offset. > > #!/bin/bash > > echo "Kernel: $(uname -r)" > echo "eMMC part: $(dmesg | grep 'mmcblk1: mmc1:0001' | awk '{print $5}')" > BLK1=$(hexdump -C /dev/mmcblk1 -s 0x3fc000 -n 10 | head -n 1) > BOOT=$(hexdump -C /dev/mmcblk1boot0 -s 0x3fc000 -n 10 | head -n 1) > > echo "/dev/mmcblk1: ${BLK1}" > echo "/dev/mmcblk1boot0: ${BOOT}" > > while [[ "$BLK1" != "$BOOT" ]]; do > sleep 20 > BOOT=$(hexdump -C /dev/mmcblk1boot0 -s 0x3fc000 -n 10 | head -n 1) > echo "/dev/mmcblk1boot0: ${BOOT}" > done > > echo "/dev/mmcblk1boot0 read failure" > >> Why 6.1.38? nowadays the 6.1.x stable is 6.1.127 already. >> Can you test with the latest stable release? > Good question. I can certainly update to .127 but at the time we were shipping > units we were on .38 so that's where I've been doing all my testing. I'll let > you know how running under .127 compares. Testing with 6.1.127 shows the same behavior. >> I believe this issue could be reproduced on the beaglebone-ai board (I don't >> have it). >> >> [1] https://www.beagleboard.org/boards/beaglebone-ai > Thanks for the suggestion, I'll see if I can dig one up. > >> Best regards, >> Romain >> >> >>> [1] https://lore.kernel.org/all/2e5f1997-564c-44e4-b357-6343e0dae7ab@xxxxxxxx/ >>> >>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3edf588e7fe00e90d1dc7fb9e599861b2c2cf442 >>> >>> [3] https://lore.kernel.org/linux-omap/20240315234444.816978-1-romain.naour@xxxxxxxx/T/#u >>> >>> Regards, >>> >>> Dave >>> Thanks, Dave