On Wed, 21 Jun 2023 at 14:28, Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote: > > On Tue, 20 Jun 2023 at 21:27, Linus Walleij <linus.walleij@xxxxxxxxxx> wrote: > > > > On Tue, Jun 20, 2023 at 11:11 AM Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote: > > > > > The ux500 variant doesn't have a HW based timeout to use for busy-end IRQs. > > > To avoid hanging and waiting for the card to stop signaling busy, let's > > > schedule a delayed work, according to the corresponding cmd->busy_timeout > > > for the command. If work gets to run, let's kick the IRQ handler to > > > completed the currently running request/command. > > > > > > Reviewed-by: Linus Walleij <linus.walleij@xxxxxxxxxx> > > > Tested-by: Linus Walleij <linus.walleij@xxxxxxxxxx> > > > Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx> > > > > Some experiments: I added a patch to print the offending command and > > dev_err() all the timing bugs. The resulting log: > > Interesting! > > > > > /home/linus # dmesg |grep '80005000\|mmcblk2' > > [ 2.684814] mmci-pl18x 80005000.mmc: mmc2: PL180 manf 80 rev4 at > > 0x80005000 irq 81,0 (pio) > > [ 2.695831] mmci-pl18x 80005000.mmc: DMA channels RX dma0chan4, TX dma0chan5 > > [ 3.410400] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > > [ 3.434936] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > > [ 3.451721] mmci-pl18x 80005000.mmc: lost busy status when waiting > > for busy start IRQ (op 06) > > [ 3.489379] mmcblk2: mmc2:0001 M4G1YC 3.69 GiB > > [ 3.569000] mmcblk2: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 > > p14 p15 p16 p17 p18 p19 p20 p21 p22 p23 p24 p25 > > [ 3.583526] mmcblk2boot0: mmc2:0001 M4G1YC 2.00 MiB > > [ 3.594726] mmcblk2boot1: mmc2:0001 M4G1YC 2.00 MiB > > [ 3.602233] mmcblk2rpmb: mmc2:0001 M4G1YC 128 KiB, chardev (246:0) > > [ 4.103057] mmci-pl18x 80005000.mmc: lost busy status when waiting > > for busy start IRQ (op 06) > > [ 8.074188] mmci-pl18x 80005000.mmc: lost busy status when waiting > > for busy start IRQ (op 06) > > [ 8.084350] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > > [ 8.451446] mmci-pl18x 80005000.mmc: lost busy status when waiting > > for busy start IRQ (op 06) > > [ 8.757934] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > > [ 10.211883] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > > [ 10.587646] mmci-pl18x 80005000.mmc: lost busy status when waiting > > for busy start IRQ (op 06) > > [ 10.913604] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > > [ 10.924072] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > > [ 10.931671] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > > [ 12.023345] mmci-pl18x 80005000.mmc: lost busy status when waiting > > for busy start IRQ (op 06) > > [ 12.357757] mmci-pl18x 80005000.mmc: lost busy status when waiting > > for busy start IRQ (op 06) > > [ 14.087677] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > > [ 14.096191] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > > [ 15.124114] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > > [ 15.153411] mmci-pl18x 80005000.mmc: lost busy status when waiting > > for busy start IRQ (op 06) > > [ 15.525024] mmci-pl18x 80005000.mmc: lost busy status when waiting > > for busy start IRQ (op 06) > > [ 15.850036] mmci-pl18x 80005000.mmc: timeout waiting for busy IRQ (op 06) > > [ 18.250122] mmci-pl18x 80005000.mmc: lost busy status when waiting > > for busy start IRQ (op 06) > > [ 18.988983] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > > [ 19.302612] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > > [ 19.320953] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > > [ 19.333251] mmci-pl18x 80005000.mmc: lost busy status when waiting > > for busy start IRQ (op 06) > > [ 21.851715] mmci-pl18x 80005000.mmc: lost busy status when waiting > > for busy start IRQ (op 06) > > > > Always command 0x06, MMC_SWITCH. But this is also the *only* > > command that resturns MMC_RSP_BUSY so it doesn't say very > > much about the card in general. > > We have other commands that have R1B responses. > > For example, open-ended I/O writes that are completed with CMD12 or > the erase/discard command. Not sure which ones that are easiest for > you to test with, we can chat more about this offlist. > > > > > We have speculated that the card is maybe operating out of spec, such as > > holding DAT0 high for too short time (not across enough MCLK cycles) > > or similar when indicating busy for short timespans. > > By looking at the above log, it seems like the card does signal busy, > at least occasionally. > > 1) The print "no busy signalling in time", isn't so worrying to me. It > probably means that the card doesn't need to signal busy, as it's > internal operation is completed. To be absolutely sure, we could > extend the polling loop to more retries, just to test. > > 2) The print "lost busy status when waiting for busy start IRQ" is a > bit harder to understand. However, my guess is that the mmci > controller does raise an IRQ, to signal a *changed* busy signal state, > which is the most important thing. When we end up reading the busy > status bit to manage the IRQ, we find that the card has stopped > signalling busy. That seems perfectly fine to me. So, maybe we don't > get the start IRQ (as that state wasn't discovered) but only the end > IRQ, so to say. > > More importantly, the busy timeout work never gets to run, which > indicates that we are no longer hanging and waiting for an IRQ to be > raised. Is that correct? Ehh, I should have looked more closely at the log. Indeed there is one case where the timeout work kicks in. Maybe we should log the information about the current ->busy_state at that point too, so understand under what condition we are hanging? I think we should also log the actual used timeout in this case. [...] Kind regards Uffe