On Tue, 20 Jun 2023 at 21:27, Linus Walleij <linus.walleij@xxxxxxxxxx> wrote: > > On Tue, Jun 20, 2023 at 11:11 AM Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote: > > > The ux500 variant doesn't have a HW based timeout to use for busy-end IRQs. > > To avoid hanging and waiting for the card to stop signaling busy, let's > > schedule a delayed work, according to the corresponding cmd->busy_timeout > > for the command. If work gets to run, let's kick the IRQ handler to > > completed the currently running request/command. > > > > Reviewed-by: Linus Walleij <linus.walleij@xxxxxxxxxx> > > Tested-by: Linus Walleij <linus.walleij@xxxxxxxxxx> > > Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx> > > Some experiments: I added a patch to print the offending command and > dev_err() all the timing bugs. The resulting log: Interesting! > > /home/linus # dmesg |grep '80005000\|mmcblk2' > [ 2.684814] mmci-pl18x 80005000.mmc: mmc2: PL180 manf 80 rev4 at > 0x80005000 irq 81,0 (pio) > [ 2.695831] mmci-pl18x 80005000.mmc: DMA channels RX dma0chan4, TX dma0chan5 > [ 3.410400] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > [ 3.434936] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > [ 3.451721] mmci-pl18x 80005000.mmc: lost busy status when waiting > for busy start IRQ (op 06) > [ 3.489379] mmcblk2: mmc2:0001 M4G1YC 3.69 GiB > [ 3.569000] mmcblk2: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 > p14 p15 p16 p17 p18 p19 p20 p21 p22 p23 p24 p25 > [ 3.583526] mmcblk2boot0: mmc2:0001 M4G1YC 2.00 MiB > [ 3.594726] mmcblk2boot1: mmc2:0001 M4G1YC 2.00 MiB > [ 3.602233] mmcblk2rpmb: mmc2:0001 M4G1YC 128 KiB, chardev (246:0) > [ 4.103057] mmci-pl18x 80005000.mmc: lost busy status when waiting > for busy start IRQ (op 06) > [ 8.074188] mmci-pl18x 80005000.mmc: lost busy status when waiting > for busy start IRQ (op 06) > [ 8.084350] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > [ 8.451446] mmci-pl18x 80005000.mmc: lost busy status when waiting > for busy start IRQ (op 06) > [ 8.757934] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > [ 10.211883] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > [ 10.587646] mmci-pl18x 80005000.mmc: lost busy status when waiting > for busy start IRQ (op 06) > [ 10.913604] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > [ 10.924072] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > [ 10.931671] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > [ 12.023345] mmci-pl18x 80005000.mmc: lost busy status when waiting > for busy start IRQ (op 06) > [ 12.357757] mmci-pl18x 80005000.mmc: lost busy status when waiting > for busy start IRQ (op 06) > [ 14.087677] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > [ 14.096191] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > [ 15.124114] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > [ 15.153411] mmci-pl18x 80005000.mmc: lost busy status when waiting > for busy start IRQ (op 06) > [ 15.525024] mmci-pl18x 80005000.mmc: lost busy status when waiting > for busy start IRQ (op 06) > [ 15.850036] mmci-pl18x 80005000.mmc: timeout waiting for busy IRQ (op 06) > [ 18.250122] mmci-pl18x 80005000.mmc: lost busy status when waiting > for busy start IRQ (op 06) > [ 18.988983] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > [ 19.302612] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > [ 19.320953] mmci-pl18x 80005000.mmc: no busy signalling in time (OP 06) > [ 19.333251] mmci-pl18x 80005000.mmc: lost busy status when waiting > for busy start IRQ (op 06) > [ 21.851715] mmci-pl18x 80005000.mmc: lost busy status when waiting > for busy start IRQ (op 06) > > Always command 0x06, MMC_SWITCH. But this is also the *only* > command that resturns MMC_RSP_BUSY so it doesn't say very > much about the card in general. We have other commands that have R1B responses. For example, open-ended I/O writes that are completed with CMD12 or the erase/discard command. Not sure which ones that are easiest for you to test with, we can chat more about this offlist. > > We have speculated that the card is maybe operating out of spec, such as > holding DAT0 high for too short time (not across enough MCLK cycles) > or similar when indicating busy for short timespans. By looking at the above log, it seems like the card does signal busy, at least occasionally. 1) The print "no busy signalling in time", isn't so worrying to me. It probably means that the card doesn't need to signal busy, as it's internal operation is completed. To be absolutely sure, we could extend the polling loop to more retries, just to test. 2) The print "lost busy status when waiting for busy start IRQ" is a bit harder to understand. However, my guess is that the mmci controller does raise an IRQ, to signal a *changed* busy signal state, which is the most important thing. When we end up reading the busy status bit to manage the IRQ, we find that the card has stopped signalling busy. That seems perfectly fine to me. So, maybe we don't get the start IRQ (as that state wasn't discovered) but only the end IRQ, so to say. More importantly, the busy timeout work never gets to run, which indicates that we are no longer hanging and waiting for an IRQ to be raised. Is that correct? > > I tested on the other problematic phone, Kyle, and the log is identical > (it has the same eMMC card, M4G1YC.) Okay! > > Yours, > Linus Walleij Kind regards Uffe