On Wed, 28 Jun 2023 at 21:12, Linus Walleij <linus.walleij@xxxxxxxxxx> wrote: > > To conclude the ux500 busy timeout fixes, this improves the debug and > error prints so we can see a bit what is going on. Here is a typical > dmesg with these new debug messages enabled: > > [ 2.648864] mmci-pl18x 80005000.mmc: mmc2: PL180 manf 80 rev4 > at 0x80005000 irq 81,0 (pio) > [ 2.662750] mmci-pl18x 80005000.mmc: DMA channels RX dma0chan4, TX dma0chan5 > [ 3.480407] mmci-pl18x 80005000.mmc: no busy signalling in time CMD06 > [ 3.487457] mmci-pl18x 80005000.mmc: no busy signalling in time CMD06 > [ 3.998321] mmci-pl18x 80005000.mmc: timeout in state waiting for end IRQ > waiting for busy CMD06 > [ 3.998535] mmc2: new DDR MMC card at address 0001 > [ 4.000030] mmcblk2: mmc2:0001 M4G1YC 3.69 GiB > [ 4.008361] mmcblk2: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 > p16 p17 p18 p19 p20 p21 p22 p23 p24 p25 > [ 4.017700] mmcblk2boot0: mmc2:0001 M4G1YC 2.00 MiB > [ 4.020477] mmcblk2boot1: mmc2:0001 M4G1YC 2.00 MiB > [ 4.022125] mmcblk2rpmb: mmc2:0001 M4G1YC 128 KiB, chardev (246:0) > [ 5.791381] mmci-pl18x 80005000.mmc: no busy signalling in time CMD06 > [ 10.938568] mmci-pl18x 80005000.mmc: timeout in state waiting for end IRQ > waiting for busy CMD06 > [ 17.982849] mmci-pl18x 80005000.mmc: lost busy status when waiting for > busy start IRQ CMD06 > [ 18.683563] mmci-pl18x 80005000.mmc: no busy signalling in time CMD06 > [ 19.385437] mmci-pl18x 80005000.mmc: no busy signalling in time CMD06 > [ 20.493652] mmci-pl18x 80005000.mmc: no busy signalling in time CMD06 > > We see a lot of lost IRQs and the timeout always occur while waiting for > the end IRQ, and then the busy status is *low* meaning the busy indication > is already de-asserted. > > So busy signalling is missed in various ways for various reasons, > sometimes it appears that IRQs are simply lost. > > One hypothesis is that this happens because the events happen so fast > that they are transient, and since the MMCI state machine in effect is > handling an edge trigger (rising or falling signal on DAT0) the > internal logic will miss the event, because the state machine in the > hardware is sampling the line, and will at times detect only the first > event but miss the second, fireing only one IRQ. > > We print the second timeout error with dev_err() since it is pretty > serious, the other events are so common and simple to handle that we > can keep them at dev_dbg() level. > > Signed-off-by: Linus Walleij <linus.walleij@xxxxxxxxxx> Applied for next, thanks! Kind regards Uffe > --- > drivers/mmc/host/mmci.c | 37 ++++++++++++++++++++++++++++++------- > 1 file changed, 30 insertions(+), 7 deletions(-) > > diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c > index 4fd28eaadc07..5d05269a8567 100644 > --- a/drivers/mmc/host/mmci.c > +++ b/drivers/mmc/host/mmci.c > @@ -734,7 +734,8 @@ static bool ux500_busy_complete(struct mmci_host *host, struct mmc_command *cmd, > } > retries--; > } > - dev_dbg(mmc_dev(host->mmc), "no busy signalling in time\n"); > + dev_dbg(mmc_dev(host->mmc), > + "no busy signalling in time CMD%02x\n", cmd->opcode); > ux500_busy_clear_mask_done(host); > break; > > @@ -756,7 +757,8 @@ static bool ux500_busy_complete(struct mmci_host *host, struct mmc_command *cmd, > host->busy_state = MMCI_BUSY_WAITING_FOR_END_IRQ; > } else { > dev_dbg(mmc_dev(host->mmc), > - "lost busy status when waiting for busy start IRQ\n"); > + "lost busy status when waiting for busy start IRQ CMD%02x\n", > + cmd->opcode); > cancel_delayed_work(&host->ux500_busy_timeout_work); > ux500_busy_clear_mask_done(host); > } > @@ -770,13 +772,14 @@ static bool ux500_busy_complete(struct mmci_host *host, struct mmc_command *cmd, > ux500_busy_clear_mask_done(host); > } else { > dev_dbg(mmc_dev(host->mmc), > - "busy status still asserted when handling busy end IRQ - will keep waiting\n"); > + "busy status still asserted when handling busy end IRQ - will keep waiting CMD%02x\n", > + cmd->opcode); > } > break; > > default: > - dev_dbg(mmc_dev(host->mmc), "fell through on state %d\n", > - host->busy_state); > + dev_sbg(mmc_dev(host->mmc), "fell through on state %d, CMD%02x\n", > + host->busy_state, cmd->opcode); > break; > } > > @@ -1503,6 +1506,20 @@ mmci_cmd_irq(struct mmci_host *host, struct mmc_command *cmd, > } > } > > +static char *ux500_state_str(struct mmci_host *host) > +{ > + switch (host->busy_state) { > + case MMCI_BUSY_WAITING_FOR_START_IRQ: > + return "waiting for start IRQ"; > + case MMCI_BUSY_WAITING_FOR_END_IRQ: > + return "waiting for end IRQ"; > + case MMCI_BUSY_DONE: > + return "not waiting for IRQs"; > + default: > + return "unknown"; > + } > +} > + > /* > * This busy timeout worker is used to "kick" the command IRQ if a > * busy detect IRQ fails to appear in reasonable time. Only used on > @@ -1522,9 +1539,15 @@ static void ux500_busy_timeout_work(struct work_struct *work) > /* If we are still busy, let's tag on timeout error. */ > if (status & host->variant->busy_detect_flag) { > status |= MCI_CMDTIMEOUT; > - dev_dbg(mmc_dev(host->mmc), "timeout waiting for busy\n"); > + dev_err(mmc_dev(host->mmc), > + "timeout in state %s still busy with CMD%02x\n", > + ux500_state_str(host), > + host->cmd->opcode); > } else { > - dev_dbg(mmc_dev(host->mmc), "timeout waiting for busy IRQ\n"); > + dev_err(mmc_dev(host->mmc), > + "timeout in state %s waiting for busy CMD%02x\n", > + ux500_state_str(host), > + host->cmd->opcode); > } > > mmci_cmd_irq(host, host->cmd, status); > -- > 2.40.1 >