On 30/11/23 11:34:18, Ulf Hansson wrote: > On Wed, 29 Nov 2023 at 17:05, Jorge Ramirez-Ortiz <jorge@xxxxxxxxxxxx> wrote: > > > > On the eMMC SanDisk iNAND 7250 configured with HS200, requesting a > > re-tune before switching to the RPMB partition would randomly cause > > subsequent RPMB requests to fail with EILSEQ: > > * data error -84, tigggered in __mmc_blk_ioctl_cmd() > > > > This commit skips the retune when switching to RPMB. > > Tested over several days with per minute RPMB reads. > > This sounds weird to me and needs more testing/debugging in my > opinion, especially at the host driver level. Perhaps add some new > tests in mmc_test, that does a partition switch to/from any partition > and then run regular I/O again to see if the problem is easier to > reproduce? hi Uffe ok I'll have a look - I have never used this driver before, so if you have anything in the works I'll be glad to integrated and adapt. > > The point is, I wonder what is so special with RPMB here? Note that, > it has been quite common that host drivers/controllers have had issues > with their tuning support, so I would not be surprised if that is the > case here too. Right, it is just that the tuning function for of-arasan is the generic __sdhci_execute_tuning() - only wrapped around arasan DLL reset calls. Hence why I aimed for the card: __sdhci_execute_tuning and ZynqMP are not recent functions or architectures. > Certainly I would be surprised if the problem is at > the eMMC card side, but I may be wrong. How do maintainers test the tuning methods? is there anything else for me to do other than forcing a retune with different partitions? > > Kind regards > Uffe For completeness this is the error message - notice that we have a trusted application (fiovb) going through OP-TEE and back to the TEE supplicant issuing an rpmb read of a variable (pretty normal these days, we use it on many different platforms - ST, NXP, AMD/Xilinx, TI..). The issue on this Zynqmp platform is scarily simple to reproduce; you can ignore the OP-TEE trace, it is just the TEE way of reporting that the RPMB read failed. root@uz3cg-dwg-sec:/var/rootdirs/home/fio# fiovb_printenv m4hash [ 461.775084] sdhci-arasan ff160000.mmc: __mmc_blk_ioctl_cmd: data error -84 E/TC:? 0 E/TC:? 0 TA panicked with code 0xffff0000 E/LD: Status of TA 22250a54-0bf1-48fe-8002-7b20f1c9c9b1 E/LD: arch: aarch64 E/LD: region 0: va 0xc0004000 pa 0x7e200000 size 0x002000 flags rw-s (ldelf) E/LD: region 1: va 0xc0006000 pa 0x7e202000 size 0x008000 flags r-xs (ldelf) E/LD: region 2: va 0xc000e000 pa 0x7e20a000 size 0x001000 flags rw-s (ldelf) E/LD: region 3: va 0xc000f000 pa 0x7e20b000 size 0x004000 flags rw-s (ldelf) E/LD: region 4: va 0xc0013000 pa 0x7e20f000 size 0x001000 flags r--s E/LD: region 5: va 0xc0014000 pa 0x7e22c000 size 0x005000 flags rw-s (stack) E/LD: region 6: va 0xc0019000 pa 0x816b31fc8 size 0x001000 flags rw-- (param) E/LD: region 7: va 0xc001a000 pa 0x816aa1fc8 size 0x002000 flags rw-- (param) E/LD: region 8: va 0xc006b000 pa 0x00001000 size 0x014000 flags r-xs [0] E/LD: region 9: va 0xc007f000 pa 0x00015000 size 0x008000 flags rw-s [0] E/LD: [0] 22250a54-0bf1-48fe-8002-7b20f1c9c9b1 @ 0xc006b000 E/LD: Call stack: E/LD: 0xc006de58 E/LD: 0xc006b388 E/LD: 0xc006ed40 E/LD: 0xc006b624 Read persistent value for m4hash failed: Exec format error Also I instrumented sdhci-of-arasan.c to confirm that tuning wasn't failing. diff --git a/drivers/mmc/host/sdhci-of-arasan.c b/drivers/mmc/host/sdhci-of-arasan.c index 681ac4cab8ab..54cde79d2719 100644 --- a/drivers/mmc/host/sdhci-of-arasan.c +++ b/drivers/mmc/host/sdhci-of-arasan.c @@ -1123,7 +1123,10 @@ static int arasan_zynqmp_execute_tuning(struct mmc_host *mmc, u32 opcode) err = sdhci_execute_tuning(mmc, opcode); if (err) - return err; + WARN_ON(1); + + if (host->tuning_err) + WARN_ON(1); arasan_zynqmp_dll_reset(host, device_id); Incidentally - not sure if it is intentional or not - I noticed that the function arasan_zynqmp_execute_tuning(..) can not fail which seems wrong (IMO it should also check host->tuning_err and not only err which will always be 0). Do you think this needs fixing even though not related to this problem? TIA Jorge > > > > > Signed-off-by: Jorge Ramirez-Ortiz <jorge@xxxxxxxxxxxx> > > --- > > Fixes v1: kernel test robot identified typo causing build failure > > CIF_MANFID_SANDISK_SD --> CID_MANFID_SANDISK_SD > > > > drivers/mmc/core/block.c | 6 +++++- > > drivers/mmc/core/card.h | 7 +++++++ > > drivers/mmc/core/quirks.h | 7 +++++++ > > include/linux/mmc/card.h | 1 + > > 4 files changed, 20 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c > > index 152dfe593c43..9b7ba6562a3b 100644 > > --- a/drivers/mmc/core/block.c > > +++ b/drivers/mmc/core/block.c > > @@ -860,6 +860,11 @@ static int mmc_blk_part_switch_pre(struct mmc_card *card, > > return ret; > > } > > mmc_retune_pause(card->host); > > + > > + /* Do not force retune before RPMB switch */ > > + if (mmc_can_retune(card->host) && > > + mmc_card_broken_rpmb_retune(card)) > > + card->host->need_retune = 0; > > } > > > > return ret; > > @@ -3143,4 +3148,3 @@ module_exit(mmc_blk_exit); > > > > MODULE_LICENSE("GPL"); > > MODULE_DESCRIPTION("Multimedia Card (MMC) block device driver"); > > - > > diff --git a/drivers/mmc/core/card.h b/drivers/mmc/core/card.h > > index b7754a1b8d97..1e1555a15de9 100644 > > --- a/drivers/mmc/core/card.h > > +++ b/drivers/mmc/core/card.h > > @@ -85,6 +85,7 @@ struct mmc_fixup { > > #define CID_MANFID_MICRON 0x13 > > #define CID_MANFID_SAMSUNG 0x15 > > #define CID_MANFID_APACER 0x27 > > +#define CID_MANFID_SANDISK2 0x45 > > #define CID_MANFID_KINGSTON 0x70 > > #define CID_MANFID_HYNIX 0x90 > > #define CID_MANFID_KINGSTON_SD 0x9F > > @@ -284,4 +285,10 @@ static inline int mmc_card_broken_cache_flush(const struct mmc_card *c) > > { > > return c->quirks & MMC_QUIRK_BROKEN_CACHE_FLUSH; > > } > > + > > +static inline int mmc_card_broken_rpmb_retune(const struct mmc_card *c) > > +{ > > + return c->quirks & MMC_QUIRK_BROKEN_RPMB_RETUNE; > > +} > > + > > #endif > > diff --git a/drivers/mmc/core/quirks.h b/drivers/mmc/core/quirks.h > > index cca71867bc4a..56c79b6b3537 100644 > > --- a/drivers/mmc/core/quirks.h > > +++ b/drivers/mmc/core/quirks.h > > @@ -130,6 +130,13 @@ static const struct mmc_fixup __maybe_unused mmc_blk_fixups[] = { > > MMC_FIXUP(CID_NAME_ANY, CID_MANFID_SANDISK_SD, 0x5344, add_quirk_sd, > > MMC_QUIRK_BROKEN_SD_DISCARD), > > > > + /* > > + * SanDisk iNAND 7250 DG4064, this quirk shall disable the retune > > + * operation enforced by default when switching to RPMB. > > + */ > > + MMC_FIXUP("DG4064", CID_MANFID_SANDISK2, 0x100, add_quirk_mmc, > > + MMC_QUIRK_BROKEN_RPMB_RETUNE), > > + > > END_FIXUP > > }; > > > > diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h > > index 7b12eebc5586..bd6986189e8b 100644 > > --- a/include/linux/mmc/card.h > > +++ b/include/linux/mmc/card.h > > @@ -296,6 +296,7 @@ struct mmc_card { > > #define MMC_QUIRK_BROKEN_SD_DISCARD (1<<14) /* Disable broken SD discard support */ > > #define MMC_QUIRK_BROKEN_SD_CACHE (1<<15) /* Disable broken SD cache support */ > > #define MMC_QUIRK_BROKEN_CACHE_FLUSH (1<<16) /* Don't flush cache until the write has occurred */ > > +#define MMC_QUIRK_BROKEN_RPMB_RETUNE (1<<17) /* Don't force a retune before switching to RPMB */ > > > > bool written_flag; /* Indicates eMMC has been written since power on */ > > bool reenable_cmdq; /* Re-enable Command Queue */ > > -- > > 2.34.1