RE: [PATCH] mmc: block: add reset workaround for partition switch failures

Guan Wang <guan.wang.jy@xxxxxxxxxxx> · Mon, 3 Mar 2025 02:13:24 +0000

Hello,
>> Some eMMC devices (e.g., BGSD4R and AIM20F) may enter an unresponsive 
>> state after encountering CRC errors during RPMB writes (CMD25). This 
>> prevents the device from switching back to the main partition via 
>> CMD6, blocking further I/O operations.
>Different cards on the same platform?
>Can you share which platform, and few lines from the log supporting your analysis?

I tested on R-Car Gen3/4 platforms, which use the same host controller IP and the tmio_mmc host driver.
The tests were conducted on different board and eMMC combinations:
- Gen3 Board with Samsung eMMC (BGSD4R) → Issue observed
- Gen3 Board with Micron eMMC (AIM20F, new version) → Issue observed
- Gen3 Board with Micron eMMC (AIM20F, old version) → No issue
- Gen4 Board with Micron eMMC (G1M15L) → No issue

The issue only occurs in the RPMB partition during write operations, where a CRC error is triggered.
To investigate further, I hacked the host driver to generate a dummy CRC during the CMD25 data phase.
The reproduced log is as follows:
$ ./mmc rpmb read-counter /dev/mmcblk0rpmb
[   75.557848] w_t: -->START_CMD6 (arg: 3b30301)
[   75.557863] w_t:    resp[0]=900
[   75.557875] w_t: -->START_CMD13 (arg: 10000)
[   75.557884] w_t:    resp[0]=900
[   75.557894] w_t: -->START_CMD23 (arg: 1)
[   75.557903] w_t:    resp[0]=900
[   75.557915] w_t: -->START_CMD25 (arg: 0)
[   75.557924] w_t:    resp[0]=900
[   75.557931] !!!!!!!!!!!!!!!!, make a dummy write CRC on DAT
[   75.563631] w_t: (data_err) -84 stat=20820604 error=5800 (which means eMMC device feedbacked nagative CRC status)
[   75.563672] renesas_sdhi_internal_dmac ee140000.sd: __mmc_blk_ioctl_cmd: data error -84
[   75.573112] w_t: -->START_CMD6 (arg: 3b30001)
[   75.573132] w_t: (cmd_err -110) stat=20c00401 error=12000
[   75.573154] w_t: -->START_CMD6 (arg: 3b30001)
[   75.573169] w_t: (cmd_err -110) stat=20c00401 error=12000
[   75.573183] w_t: -->START_CMD6 (arg: 3b30001)
[   75.573197] w_t: (cmd_err -110) stat=20c00401 error=12000
[   75.573211] w_t: -->START_CMD6 (arg: 3b30001)
[   75.573225] w_t: (cmd_err -110) stat=20c00401 error=12000
After this issue occurs, the eMMC device no longer responds to CMD6, even subsequent accesses to the main partition proceed abnormally.
However, if we perform an eMMC card reset at this point, the retry of CMD6 works as expected.

BTW,
I now believe that sending CMD12 is a better solution in this case rather than performing a reset.
According to information from the eMMC vendor, even in a closed-end write operation (CMD23 + CMD25), CMD12 is required if any communication error occurs.
The JESD84 specification also mentions a similar requirement: "A stop command is not required at the end of this type of multiple block write unless terminated with an error."
I just simply tested this approach on the affected board, and it can work successfully.

>> 
>> The root cause is suspected to be a firmware/hardware issue in 
>> specific eMMC models. A workaround is to perform a hardware reset via
>> mmc_hw_reset()
>> when the partition switch fails, followed by a retry.
>Same fw bug in 2 different products?
>
>Why do we need to fix it here?
>The ioctl will eventually return an error, and reset is needed anyway.
>If the eMMC is the primary storage,  the platform is rebooting without being aware what went wrong.

In the main partition, a similar reset operation is already implemented in mmc_blk_issue_rw_rq(),
So I believe applying the same approach for RPMB should be acceptable.
		case MMC_BLK_ABORT:
			if (!mmc_blk_reset(md, card->host, type))
				break;
			mmc_blk_rw_cmd_abort(mq, card, old_req, mq_rq);
			mmc_blk_rw_try_restart(mq, new_req, mqrq_cur);
			return;

Best Regards,
Guan Wang