Hi Romain On 1/24/25 04:36, Romain Naour wrote: > Hello David, > > Le 23/01/2025 à 23:09, David Owens a écrit : >> Hello, >> >> I have a AM574x system and encountered an eMMC regression when upgrading from 5.15 to 6.1.38. The eMMC is using mmc-hs200 powered at 1.8v. Reads from /dev/mmcblk1boot0 will return expected data except when a delay of several seconds is inserted between reads. With a delay between reads, the read will occasionally (~50% of the time) return garbage data. Using hexdump, I was able to determine that the "bad" data is actually coming from /dev/mmcblk1, not /dev/mmcblk1boot0. The same thing happens when reading from /dev/mmcblk1boot1. >> >> Much like a previous report in the linux-omap mailing list [1], I too was able to correct the regression by reverting the commit "mmc: sdhci-omap: Allow SDIO card power off and enable aggressive PM" [2]. Unlike the previous report, applying the sdhci-omap patch [3] did not resolve my issue. Only reverting the original commit allowed for reliable reads from /dev/mmcblk1boot0. I also don't see the same I/O errors mentioned in the previous posting. Reads always succeed and return the correct amount of data, its just from the wrong device. > Interesting, can you share a test script to reproduce your issue? Here is a test script I've been running on my devices. A failure is typically detected after a minute or two. I include the eMMC part type in the output as we've used a couple different parts in production, all claiming to be compatible and I'm starting to wonder if the failure is a combination of the aggressive PM _and_ specific emmc parts. The offset used in hexdump was just a place in both mmcblk1 and mmcblk1boot0 that was non-zero. The issue happens using any offset. #!/bin/bash echo "Kernel: $(uname -r)" echo "eMMC part: $(dmesg | grep 'mmcblk1: mmc1:0001' | awk '{print $5}')" BLK1=$(hexdump -C /dev/mmcblk1 -s 0x3fc000 -n 10 | head -n 1) BOOT=$(hexdump -C /dev/mmcblk1boot0 -s 0x3fc000 -n 10 | head -n 1) echo "/dev/mmcblk1: ${BLK1}" echo "/dev/mmcblk1boot0: ${BOOT}" while [[ "$BLK1" != "$BOOT" ]]; do sleep 20 BOOT=$(hexdump -C /dev/mmcblk1boot0 -s 0x3fc000 -n 10 | head -n 1) echo "/dev/mmcblk1boot0: ${BOOT}" done echo "/dev/mmcblk1boot0 read failure" > > Why 6.1.38? nowadays the 6.1.x stable is 6.1.127 already. > Can you test with the latest stable release? Good question. I can certainly update to .127 but at the time we were shipping units we were on .38 so that's where I've been doing all my testing. I'll let you know how running under .127 compares. > > I believe this issue could be reproduced on the beaglebone-ai board (I don't > have it). > > [1] https://www.beagleboard.org/boards/beaglebone-ai Thanks for the suggestion, I'll see if I can dig one up. > > Best regards, > Romain > > >> [1] https://lore.kernel.org/all/2e5f1997-564c-44e4-b357-6343e0dae7ab@xxxxxxxx/ >> >> [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3edf588e7fe00e90d1dc7fb9e599861b2c2cf442 >> >> [3] https://lore.kernel.org/linux-omap/20240315234444.816978-1-romain.naour@xxxxxxxx/T/#u >> >> Regards, >> >> Dave >> >>