Hello David, All, Le 24/01/2025 à 19:49, David a écrit : > > On 1/24/25 11:15, David Owens wrote: >> Hi Romain >> >> On 1/24/25 04:36, Romain Naour wrote: >>> Hello David, >>> >>> Le 23/01/2025 à 23:09, David Owens a écrit : >>>> Hello, >>>> >>>> I have a AM574x system and encountered an eMMC regression when upgrading from 5.15 to 6.1.38. The eMMC is using mmc-hs200 powered at 1.8v. Reads from /dev/mmcblk1boot0 will return expected data except when a delay of several seconds is inserted between reads. With a delay between reads, the read will occasionally (~50% of the time) return garbage data. Using hexdump, I was able to determine that the "bad" data is actually coming from /dev/mmcblk1, not /dev/mmcblk1boot0. The same thing happens when reading from /dev/mmcblk1boot1. >>>> >>>> Much like a previous report in the linux-omap mailing list [1], I too was able to correct the regression by reverting the commit "mmc: sdhci-omap: Allow SDIO card power off and enable aggressive PM" [2]. Unlike the previous report, applying the sdhci-omap patch [3] did not resolve my issue. Only reverting the original commit allowed for reliable reads from /dev/mmcblk1boot0. I also don't see the same I/O errors mentioned in the previous posting. Reads always succeed and return the correct amount of data, its just from the wrong device. >>> Interesting, can you share a test script to reproduce your issue? >> Here is a test script I've been running on my devices. A failure is typically >> detected after a minute or two. I include the eMMC part type in the output as >> we've used a couple different parts in production, all claiming to be compatible >> and I'm starting to wonder if the failure is a combination of the aggressive >> PM _and_ specific emmc parts. The offset used in hexdump was just a place in >> both mmcblk1 and mmcblk1boot0 that was non-zero. The issue happens using any >> offset. >> >> #!/bin/bash >> >> echo "Kernel: $(uname -r)" >> echo "eMMC part: $(dmesg | grep 'mmcblk1: mmc1:0001' | awk '{print $5}')" >> BLK1=$(hexdump -C /dev/mmcblk1 -s 0x3fc000 -n 10 | head -n 1) >> BOOT=$(hexdump -C /dev/mmcblk1boot0 -s 0x3fc000 -n 10 | head -n 1) >> >> echo "/dev/mmcblk1: ${BLK1}" >> echo "/dev/mmcblk1boot0: ${BOOT}" >> >> while [[ "$BLK1" != "$BOOT" ]]; do >> sleep 20 >> BOOT=$(hexdump -C /dev/mmcblk1boot0 -s 0x3fc000 -n 10 | head -n 1) >> echo "/dev/mmcblk1boot0: ${BOOT}" >> done >> >> echo "/dev/mmcblk1boot0 read failure" >> >>> Why 6.1.38? nowadays the 6.1.x stable is 6.1.127 already. >>> Can you test with the latest stable release? >> Good question. I can certainly update to .127 but at the time we were shipping >> units we were on .38 so that's where I've been doing all my testing. I'll let >> you know how running under .127 compares. > > Testing with 6.1.127 shows the same behavior. Thanks for testing. I'm able to reproduce the issue locally (using a kernel 6.1.112). It fail after the first sleep 20... If I remove MMC_CAP_AGGRESSIVE_PM from the sdhci-omap driver the issue is gone. About sdhci-omap driver, It's one of the only few enabling MMC_CAP_AGGRESSIVE_PM. I recently switched to a new project using a newer SoC but the eMMC driver doesn't event set MMC_CAP_AGGRESSIVE_PM. I'm wondering if MMC_CAP_AGGRESSIVE_PM is really safe (or compatible) for HS200/HS400 eMMC speed. Indeed, MMC_CAP_AGGRESSIVE_PM has been added to sdhci-omap driver to support SDIO WLAN device PM [1]. I've found another similar report on the Beaglebone-black (AM335x SoC) [2]. It seems the MMC_CAP_AGGRESSIVE_PM feature should only be enabled to SDIO cards. The TRM (SoC manual) says that "Suspend-Resume Flow" is only supported for SDIO cards: 26.5.1.2.1.6 Suspend-Resume Flow The suspend-and-resume feature is supported only by SDIO cards. Thoughts? [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3edf588e7fe00e90d1dc7fb9e599861b2c2cf442 [2] https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1332523/beagl-bone-black-problems-reading-from-emmc-boot-partitions-on-beaglebone-with-kernel-6-1 Best regards, Romain > >>> I believe this issue could be reproduced on the beaglebone-ai board (I don't >>> have it). >>> >>> [1] https://www.beagleboard.org/boards/beaglebone-ai >> Thanks for the suggestion, I'll see if I can dig one up. >> >>> Best regards, >>> Romain >>> >>> >>>> [1] https://lore.kernel.org/all/2e5f1997-564c-44e4-b357-6343e0dae7ab@xxxxxxxx/ >>>> >>>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3edf588e7fe00e90d1dc7fb9e599861b2c2cf442 >>>> >>>> [3] https://lore.kernel.org/linux-omap/20240315234444.816978-1-romain.naour@xxxxxxxx/T/#u >>>> >>>> Regards, >>>> >>>> Dave >>>> > Thanks, > > Dave >