On Tue, 29 Jun 2021 at 19:23, Linus Walleij <linus.walleij@xxxxxxxxxx> wrote: > > On Tue, Jun 29, 2021 at 3:56 PM Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote: > > > I would > > prefer some more details before I am ready to apply this. For example, > > more exactly, what happens when the stall is triggered? Can you > > provide some more debug info, perhaps with the sequence of commands > > that triggers the error? > > I have added debug prints and it looks like this, first tons of normal > traffick switching between the different partitions with CMD6, > both 0 (main), 1 (boot0) and 2 (boot1) are investigated before > results. At the end of the trail this happens: > > [ 54.171173] mmc_host mmc2: start CMD18 arg 00005010 > [ 54.171478] mmc_host mmc2: start CMD18 arg 00005028 > [ 54.172058] mmc_host mmc2: start CMD18 arg 00005048 > [ 54.172790] mmc_host mmc2: start CMD18 arg 00005088 > [ 54.174926] mmc_host mmc2: start CMD18 arg 00005108 > [ 54.177581] mmc_host mmc2: start CMD18 arg 00015038 > [ 54.178039] mmc_host mmc2: start CMD18 arg 00066400 > [ 54.178222] mmc_host mmc2: start CMD18 arg 00004880 > [ 54.178497] mmc_host mmc2: start CMD18 arg 00000040 > [ 54.178649] mmc_host mmc2: start CMD18 arg 00015078 > [ 54.178894] mmc_host mmc2: start CMD18 arg 00066800 > [ 54.179382] mmc2: modify EXT_CSD, index 179, value: 1, set 1, timing 0 > [ 54.179412] mmc_host mmc2: start CMD6 arg 03b30101 > [ 54.180145] mmc2 modify EXT_CSD completed (0) > [ 54.180175] mmc_host mmc2: start CMD13 arg 00010000 > [ 54.180267] mmc_host mmc2: start CMD18 arg 00001ff0 > [ 54.180755] mmc2: modify EXT_CSD, index 179, value: 0, set 1, timing 0 > [ 54.180786] mmc_host mmc2: start CMD6 arg 03b30001 > [ 54.180847] mmc2 modify EXT_CSD completed (0) > [ 54.180847] mmc_host mmc2: start CMD13 arg 00010000 > [ 54.180938] mmc_host mmc2: start CMD18 arg 00015010 > [ 54.181121] mmc_host mmc2: start CMD18 arg 00015028 > [ 54.181365] mmc_host mmc2: start CMD18 arg 00015048 > [ 54.182312] mmc_host mmc2: start CMD18 arg 00015088 > [ 54.183654] mmc_host mmc2: start CMD18 arg 00015108 > [ 54.186187] mmc_host mmc2: start CMD18 arg 00066018 > [ 54.186767] mmc_host mmc2: start CMD18 arg 00004900 > [ 54.186950] mmc_host mmc2: start CMD18 arg 00000080 > [ 54.187225] mmc_host mmc2: start CMD18 arg 00066038 > [ 54.187469] mmc_host mmc2: start CMD18 arg 00004a00 > [ 54.187713] mmc_host mmc2: start CMD18 arg 00066078 > [ 54.187988] mmc_host mmc2: start CMD18 arg 00004c00 > [ 54.188293] mmc2: modify EXT_CSD, index 179, value: 1, set 1, timing 0 > [ 54.188323] mmc_host mmc2: start CMD6 arg 03b30101 > [ 54.188354] mmc2 modify EXT_CSD completed (0) > [ 54.188385] mmc_host mmc2: start CMD13 arg 00010000 > [ 54.188446] mmc_host mmc2: start CMD18 arg 00000000 > [ 54.189300] mmc2: modify EXT_CSD, index 179, value: 0, set 1, timing 0 > [ 54.189331] mmc_host mmc2: start CMD6 arg 03b30001 > > After this CMD6 just hangs there. Nothing proceeds on mmc2 (the eMMC) > ever again. > > > Moreover, it would be nice to know exactly where in the code the stall > > happens. Could it be that it's the mmc_wait_for_cmd() that never > > returns in __mmc_switch(), when sending the CMD6 to switch the > > partition? > > __mmc_switch() initiates a partition switch, then > issues CMD6. Then: > > __mmc_switch() > mmc_wait_for_cmd() > mmc_wait_for_req() > __mmc_start_req() > mmc_wait_for_req_done() > > We wait forever in mmc_wait_for_req_done() since the completion > never arrives. Thanks for sharing these details. It looks like the mmci driver is waiting for the busy completion IRQ, forever. > > > My main worry is that we should not set a card quirk for an eMMC that > > could be working fine (on some other platforms), that's why I want us > > to be sure. > > Yes we need to get to the bottom of this. > > One theory as we discussed on chat is that the busy detect > for the previous command isn't working as it should leading us > to prematurely start CMD6 while the previous CMD18 is actually still > busy. > > After testing, this appears to be true! > > I disabled hardware busy detect in mmci.c like this: > > diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c > index 984d35055156..cffe95e32b9a 100644 > --- a/drivers/mmc/host/mmci.c > +++ b/drivers/mmc/host/mmci.c > @@ -180,10 +180,10 @@ static struct variant_data variant_ux500 = { > .f_max = 100000000, > .signal_direction = true, > .pwrreg_clkgate = true, > - .busy_detect = true, > - .busy_dpsm_flag = MCI_DPSM_ST_BUSYMODE, > - .busy_detect_flag = MCI_ST_CARDBUSY, > - .busy_detect_mask = MCI_ST_BUSYENDMASK, > + //.busy_detect = true, > + //.busy_dpsm_flag = MCI_DPSM_ST_BUSYMODE, > + //.busy_detect_flag = MCI_ST_CARDBUSY, > + //.busy_detect_mask = MCI_ST_BUSYENDMASK, > .pwrreg_nopower = true, > .mmcimask1 = true, > .irq_pio_mask = MCI_IRQ_PIO_MASK, > @@ -215,10 +215,10 @@ static struct variant_data variant_ux500v2 = { > .f_max = 100000000, > .signal_direction = true, > .pwrreg_clkgate = true, > - .busy_detect = true, > - .busy_dpsm_flag = MCI_DPSM_ST_BUSYMODE, > - .busy_detect_flag = MCI_ST_CARDBUSY, > - .busy_detect_mask = MCI_ST_BUSYENDMASK, > + // .busy_detect = true, > + //.busy_dpsm_flag = MCI_DPSM_ST_BUSYMODE, > + //.busy_detect_flag = MCI_ST_CARDBUSY, > + //.busy_detect_mask = MCI_ST_BUSYENDMASK, > .pwrreg_nopower = true, > .mmcimask1 = true, > .irq_pio_mask = MCI_IRQ_PIO_MASK, > > Now it works. Alright, this certainly brings us a bit closer to a fix. Either the HW busy detection isn't working properly in mmci or the eMMC card is behaving a bit odd (continues to deassert the DAT0 line forever). What certainly is missing in the mmci driver, is a software based timeout for cases like these. The mmci driver should better complete the request with -ETIMEDOUT error for the cmd, rather than hanging forever and waiting for the busy completion IRQ. > > So now I am thinking about a quirk that disables hardware busy > detect on some older eMMC cards instead, my current > assumption is that the hardware busy detect on these eMMCs > is glitchy. It works a bit but not good enough. > > I think you also mentioned the timeout in EXT_CSD maybe not being > long enough? How do I play around with this? > MMC_QUIRK_LONG_READ_TIME? As mmci doesn't care about busy timeouts, but waits forever, this is likely not the problem. However, I would like to try to narrow down the problem even further. Would you mind try the below debug patch? -- Subject: [PATCH] mmc: mmci: HACK/DEBUG: Drop IRQ based HW busy detect This change make MMC_CAP_WAIT_WHILE_BUSY to become unset, but keeps the ->card_busy ops. In this way, the core will sometimes (for CMD6 for example), use this callback rather than polling with CMD13. Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx> --- drivers/mmc/host/mmci.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c index 984d35055156..cbc67c0e64b1 100644 --- a/drivers/mmc/host/mmci.c +++ b/drivers/mmc/host/mmci.c @@ -180,7 +180,7 @@ static struct variant_data variant_ux500 = { .f_max = 100000000, .signal_direction = true, .pwrreg_clkgate = true, - .busy_detect = true, + .busy_detect = false, .busy_dpsm_flag = MCI_DPSM_ST_BUSYMODE, .busy_detect_flag = MCI_ST_CARDBUSY, .busy_detect_mask = MCI_ST_BUSYENDMASK, @@ -215,7 +215,7 @@ static struct variant_data variant_ux500v2 = { .f_max = 100000000, .signal_direction = true, .pwrreg_clkgate = true, - .busy_detect = true, + .busy_detect = false, .busy_dpsm_flag = MCI_DPSM_ST_BUSYMODE, .busy_detect_flag = MCI_ST_CARDBUSY, .busy_detect_mask = MCI_ST_BUSYENDMASK, @@ -2143,7 +2143,7 @@ static int mmci_probe(struct amba_device *dev, /* * Enable busy detection. */ - if (variant->busy_detect) { +// if (variant->busy_detect) { mmci_ops.card_busy = mmci_card_busy; /* * Not all variants have a flag to enable busy detection @@ -2152,8 +2152,8 @@ static int mmci_probe(struct amba_device *dev, if (variant->busy_dpsm_flag) mmci_write_datactrlreg(host, host->variant->busy_dpsm_flag); - mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY; - } +// mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY; +// } /* Variants with mandatory busy timeout in HW needs R1B responses. */ if (variant->busy_timeout) -- Kind regards Uffe