Re: [PATCH] mmc: rpmb: do not force a retune before RPMB switch

Adrian Hunter <adrian.hunter@xxxxxxxxx> · Mon, 11 Dec 2023 12:25:19 +0200

On 11/12/23 10:00, Jorge Ramirez-Ortiz, Foundries wrote:
> On 06/12/23 11:00:47, Jorge Ramirez-Ortiz, Foundries wrote:
>> On 06/12/23 07:02:43, Avri Altman wrote:
>>>>
>>>> On 4/12/23 17:01, Jorge Ramirez-Ortiz wrote:
>>>>> Requesting a retune before switching to the RPMB partition has been
>>>>> observed to cause CRC errors on the RPMB reads (-EILSEQ).
>>>>
>>>> There are still 2 concerns:
>>>> 1) We don't really know the root cause.  Have you determined if here are
>>>> CRC errors in the main partition also?
>>
>> right, and I don't disagree with that.
>>
>> As a test I created a 4GB file from /dev/random which I then copied
>> several times (dd if= ....)
>>
>> root@uz3cg-dwg-sec:/sys/kernel/debug/mmc0# cat err_stats
>> # Command Timeout Occurred:      0
>> # Command CRC Errors Occurred:   0
>> # Data Timeout Occurred:         0
>> # Data CRC Errors Occurred:      0
>> # Auto-Cmd Error Occurred:       0
>> # ADMA Error Occurred:   0
>> # Tuning Error Occurred:         0
>> # CMDQ RED Errors:       0
>> # CMDQ GCE Errors:       0
>> # CMDQ ICCE Errors:      0
>> # Request Timedout:      0
>> # CMDQ Request Timedout:         0
>> # ICE Config Errors:     0
>> # Controller Timedout errors:    0
>> # Unexpected IRQ errors:         0
>>
>> However as soon as I access RPMB and fails (it takes just a few tries) I see:
>>
>> I/TC: RPMB: Using generated key
>> [   86.902118] sdhci-arasan ff160000.mmc: __mmc_blk_ioctl_cmd: data error -84
>> E/TC:? 0
>> E/TC:? 0 TA panicked with code 0xffff0000
>> E/LD:  Status of TA 22250a54-0bf1-48fe-8002-7b20f1c9c9b1
>> E/LD:   arch: aarch64
>> E/LD:  region  0: va 0xc0004000 pa 0x7e200000 size 0x002000 flags rw-s (ldelf)
>> E/LD:  region  1: va 0xc0006000 pa 0x7e202000 size 0x008000 flags r-xs (ldelf)
>> E/LD:  region  2: va 0xc000e000 pa 0x7e20a000 size 0x001000 flags rw-s (ldelf)
>> E/LD:  region  3: va 0xc000f000 pa 0x7e20b000 size 0x004000 flags rw-s (ldelf)
>> E/LD:  region  4: va 0xc0013000 pa 0x7e20f000 size 0x001000 flags r--s
>> E/LD:  region  5: va 0xc0014000 pa 0x7e22c000 size 0x005000 flags rw-s (stack)
>> E/LD:  region  6: va 0xc0019000 pa 0x818ea9ba8 size 0x002000 flags rw-- (param)
>> E/LD:  region  7: va 0xc001b000 pa 0x818e97ba8 size 0x001000 flags rw-- (param)
>> E/LD:  region  8: va 0xc004f000 pa 0x00001000 size 0x014000 flags r-xs [0]
>> E/LD:  region  9: va 0xc0063000 pa 0x00015000 size 0x008000 flags rw-s [0]
>> E/LD:   [0] 22250a54-0bf1-48fe-8002-7b20f1c9c9b1 @ 0xc004f000
>> E/LD:  Call stack:
>> E/LD:   0xc0051a14
>> E/LD:   0xc004f31c
>> E/LD:   0xc0052d40
>> E/LD:   0xc004f624
>>
>> root@uz3cg-dwg-sec:/var/rootdirs/home/fio# cat /sys/kernel/debug/mmc0/err_stats
>> # Command Timeout Occurred:      0
>> # Command CRC Errors Occurred:   0
>> # Data Timeout Occurred:         0
>> # Data CRC Errors Occurred:      1
>> # Auto-Cmd Error Occurred:       0
>> # ADMA Error Occurred:   0
>> # Tuning Error Occurred:         0
>> # CMDQ RED Errors:       0
>> # CMDQ GCE Errors:       0
>> # CMDQ ICCE Errors:      0
>> # Request Timedout:      0
>> # CMDQ Request Timedout:         0
>> # ICE Config Errors:     0
>> # Controller Timedout errors:    0
>> # Unexpected IRQ errors:         0
>>
>>>> 2) Forcing this on everyone
>>>>
>>>> The original idea was that because re-tuning cannot be done in RPMB, the
>>>> need to re-rune in RPMB could be avoided by always re-tuning before
>>>> switching to RPMB and then switching straight back. IIRC re-tuning should
>>>> guarantee at least 4MB more I/O without issue.
>>> Performance is hardly an issue in the context of RPMB access -
>>> For most cases it’s a single frame.
>>
>> Yes, the security use case typically stores hashes, variables
>> (bootcount, upgrade_available, versions, that sort of thing) and
>> certificates in RPMB.
>>
>> Since you mentioned, I am seeing that tuning before switching to RPMB
>> has an impact on performance. As a practical test, just reading a 6 byte
>> variable incurs in 50ms penalty in kernel space due to the need to
>> retune 5 times. Not great since the request is coming from a Trusted
>> Application via OP-TEE through the supplicant meaning this TEE thread
>> (they are statically allocated CFG_NUM_THREADS) will be reserved for
>> quite a bit of time.
>>
>> Roughly:
>> TA --> OP-TEE (core) --> TEE-supplicant --> Kernel (>50ms) --> OP-TEE --> TA
> 
> To add more detail to the timing above, when using RPMB, OP-TEE stores
> the secure filesystem on RPMB as well, so accessing one of the variables
> stored in the filesystem consists on a number (~5) of individual RPMB
> requests (each one forcing a retune, each retune taking around 10ms).
> 
> BTW, I also tried delaying the timing between those consecutive retunes
> (up to 1 second), but the issue still persisted.
> 
>>
>> Adrian, I couldn't find the original performance justification for
>> enabling this feature globally. At which point do you think it becomes
>> beneficial to retune before accessing RPMB?
> 
> How should we proceed with this patch then? can it be merged as I
> proposed? should I rewrite it differently? not sure what is next

It would be good to try to determine if the error happens when the
switch command comes immediately after tuning.  For example, add
a delay after tuning and see if that makes any difference. e.g.

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index c79f73459915..6b168659282a 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -2867,8 +2867,10 @@ int __sdhci_execute_tuning(struct sdhci_host *host, u32 opcode)
 
 		ctrl = sdhci_readw(host, SDHCI_HOST_CONTROL2);
 		if (!(ctrl & SDHCI_CTRL_EXEC_TUNING)) {
-			if (ctrl & SDHCI_CTRL_TUNED_CLK)
+			if (ctrl & SDHCI_CTRL_TUNED_CLK) {
+				msleep(10);
 				return 0; /* Success! */
+			}
 			break;
 		}