Re: [omap_hsmmc] run into dead loop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 30 Oct 2019 at 01:05, Yi Zheng <goodmenzy@xxxxxxxxx> wrote:
>
> Hi,
>
>     I found that there may be a dead loop in omap-hsmmc driver:
>     drivers/mmc/host/omap_hsmmc.c
>
>     My testing is on v4.19.58-0-g7a6bfa08b938, but the latest
>     kernel(v5.4-rc5-19-g8005803a2ca0) has the same bug(I guess).
>
>     In omap_hsmmc_start_command(), the IRQ is enabled too early:
>         omap_hsmmc_start_command(struct omap_hsmmc_host *host, struct
> mmc_command *cmd,
>                 struct mmc_data *data)
>         {
>                 /* ..................... */
>                 omap_hsmmc_enable_irq(host, cmd);
>
>                 /* ..................... */
>                 /*  If IRQ occurs here, we run in deadloop! */
>
>                 host->req_in_progress = 1;
>
>                 /* ..................... */
>         }
>
>     In OMAP-HSMMC ISR, the C code is:
>         static irqreturn_t omap_hsmmc_irq(int irq, void *dev_id)
>         {
>                 struct omap_hsmmc_host *host = dev_id;
>                 int status;
>
>                 status = OMAP_HSMMC_READ(host->base, STAT);
>                 while (status & (INT_EN_MASK | CIRQ_EN)) {
>                         if (host->req_in_progress)
>                                 omap_hsmmc_do_irq(host, status);
>
>                         if (status & CIRQ_EN)
>                                 mmc_signal_sdio_irq(host->mmc);
>
>                         /* Flush posted write */
>                         status = OMAP_HSMMC_READ(host->base, STAT);
>                 }
>
>                 return IRQ_HANDLED;
>         }
>
>     It checks the flag host->req_in_progress, only it is set,
>     omap_hsmmc_do_irq() will be called.  The STAT register will be
>     written in omap_hsmmc_do_irq(), that will cause the interrupts to
>     be reset.
>
>     If MMC IRQ occurs in that danger time-window, my single core CPU
>     will run into endless loop in the ISR omap_hsmmc_irq(), due to
>     STAT register is never written to clean the IRQ.
>
>     Due to the small time-window, this bug is some what difficult to
>     be found. I add some debug info dumpping in that two functions,
>     that will cause the code slower, the bug will occur easier.  It is
>     realy a bug. The debug version of the source and the debug log is
>     some what tedious. I only add the snippet here:
>
>     The bug log:
>         [   12.765560] omap_hsmmc_runtime_resume:2354: omap_hsmmc
> 48060000.mmc: enabled
>         [   12.773006] mmc_mrq_pr_debug:279: mmc0: starting CMD6 arg
> 03200101 flags 0000049d
>         [   12.780941] omap_hsmmc_start_command:853: omap_hsmmc
> 48060000.mmc: mmc0: CMD6, argument 0x03200101
>         [   12.790396] omap_hsmmc_start_command:859: omap_hsmmc
> 48060000.mmc: mmc0: CMD6, argument 0x03200101, tracing
>         [   12.800662] omap_hsmmc_start_command:862: omap_hsmmc
> 48060000.mmc: mmc0: CMD6, argument 0x03200101, tracing
>         [   12.811054] omap_hsmmc_start_command:871: omap_hsmmc
> 48060000.mmc: mmc0: CMD6, argument 0x03200101, tracing
>         [   12.821331] omap_hsmmc_start_command:878: omap_hsmmc
> 48060000.mmc: mmc0: CMD6, argument 0x03200101, tracing
>         [   12.831596] omap_hsmmc_start_command:888: omap_hsmmc
> 48060000.mmc: mmc0: CMD6, argument 0x03200101, tracing
>         [   12.841861] omap_hsmmc_start_command:919: omap_hsmmc
> 48060000.mmc: mmc0: CMD6, argument 0x03200101, tracing
>         [   12.852124] omap_hsmmc_start_command:923: omap_hsmmc
> 48060000.mmc: mmc0: CMD6, argument 0x03200101, tracing
>         [   12.862393] omap_hsmmc_irq:1207: omap_hsmmc 48060000.mmc:
> mmc0: enter ISR, irq=34, status=1
>         [   12.871155] omap_hsmmc_do_irq:1157: omap_hsmmc
> 48060000.mmc: IRQ Status is 1
>         [   12.878553] omap_hsmmc_irq:1226: omap_hsmmc 48060000.mmc:
> mmc0: leave ISR, irq=34
>         [   12.886450] omap_hsmmc_start_command:939: omap_hsmmc
> 48060000.mmc: mmc0: CMD6, tracing
>         [   12.894805] CPU: 0 PID: 95 Comm: kworker/0:1H Tainted: G
>     W  O      4.19.58-0-g7a6bfa08b-cp5502 #25
>         [   12.904878] Hardware name: Generic AM33XX (Flattened Device Tree)
>         [   12.911292] Workqueue: kblockd blk_mq_run_work_fn
>         [   12.916316] [<c010f7fc>] (unwind_backtrace) from
> [<c010ce4c>] (show_stack+0x18/0x1c)
>         [   12.924506] [<c010ce4c>] (show_stack) from [<c0629cac>]
> (omap_hsmmc_request+0x2b0/0x4b0)
>         [   12.933014] [<c0629cac>] (omap_hsmmc_request) from
> [<c06147b8>] (mmc_start_request+0xa4/0xb8)
>         [   12.942015] [<c06147b8>] (mmc_start_request) from
> [<c0614834>] (mmc_wait_for_req+0x68/0xd0)
>         [   12.950828] [<c0614834>] (mmc_wait_for_req) from
> [<c0614918>] (mmc_wait_for_cmd+0x7c/0xa4)
>         [   12.959552] [<c0614918>] (mmc_wait_for_cmd) from
> [<c061bbe0>] (__mmc_switch+0xe0/0x2cc)
>         [   12.967998] [<c061bbe0>] (__mmc_switch) from [<c061bdf8>]
> (mmc_switch+0x2c/0x34)
>         [   12.975805] [<c061bdf8>] (mmc_switch) from [<c061be58>]
> (mmc_flush_cache+0x58/0x84)
>         [   12.983844] [<c061be58>] (mmc_flush_cache) from
> [<c0626f84>] (mmc_blk_mq_issue_rq+0x5d8/0x76c)
>         [   12.992927] [<c0626f84>] (mmc_blk_mq_issue_rq) from
> [<c0627644>] (mmc_mq_queue_rq+0x158/0x1fc)
>         [   13.002056] [<c0627644>] (mmc_mq_queue_rq) from
> [<c0470590>] (blk_mq_dispatch_rq_list+0x384/0x4c4)
>         [   13.011521] [<c0470590>] (blk_mq_dispatch_rq_list) from
> [<c047531c>] (blk_mq_sched_dispatch_requests+0xfc/0x174)
>         [   13.022248] [<c047531c>] (blk_mq_sched_dispatch_requests)
> from [<c046f020>] (__blk_mq_run_hw_queue+0xcc/0x110)
>         [   13.032793] [<c046f020>] (__blk_mq_run_hw_queue) from
> [<c01520f0>] (process_one_work+0x364/0x64c)
>         [   13.042142] omap_hsmmc_irq:1207: omap_hsmmc 48060000.mmc:
> mmc0: enter ISR, irq=34, status=108000
>         [   13.042156] omap_hsmmc_do_irq:1157: omap_hsmmc
> 48060000.mmc: IRQ Status is 108000
>         [   13.042173] omap_hsmmc_dbg_report_irq:1088: omap_hsmmc
> 48060000.mmc: MMC IRQ 0x108000 : ERRI DTO
>         [   13.042196] omap_hsmmc_irq:1226: omap_hsmmc 48060000.mmc:
> mmc0: leave ISR, irq=34
>         [   13.076452] [<c01520f0>] (process_one_work) from
> [<c0153020>] (worker_thread+0x294/0x408)
>         [   13.085089] [<c0153020>] (worker_thread) from [<c0158624>]
> (kthread+0x140/0x15c)
>         [   13.092861] [<c0158624>] (kthread) from [<c01010b4>]
> (ret_from_fork+0x14/0x20)
>         [   13.100482] Exception stack(0xdd179fb0 to 0xdd179ff8)
>         [   13.105828] 9fa0:
> 00000000 00000000 00000000 00000000
>         [   13.114411] 9fc0: 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000
>         [   13.123032] 9fe0: 00000000 00000000 00000000 00000000
> 00000013 00000000
>         [   13.130027] mmc_wait_for_req_done:433: mmc0: req failed
> (CMD6): -110, cmd->retries=3, retrying...
>         [   13.139390] omap_hsmmc_start_command:853: omap_hsmmc
> 48060000.mmc: mmc0: CMD6, argument 0x03200101
>         [   13.148839] omap_hsmmc_start_command:859: omap_hsmmc
> 48060000.mmc: mmc0: CMD6, argument 0x03200101, tracing
>         [   13.148855] omap_hsmmc_irq:1207: omap_hsmmc 48060000.mmc:
> mmc0: enter ISR, irq=34, status=108000
>         [   13.168274] omap_hsmmc 48060000.mmc: !! mmc0: ISR, irq=34,
> update status: 108000, req_in_progress=0, loop=1
>         [   13.178489] omap_hsmmc 48060000.mmc: !! mmc0: ISR, irq=34,
> update status: 108000, req_in_progress=0, loop=2
>         [   13.188704] omap_hsmmc 48060000.mmc: !! mmc0: ISR, irq=34,
> update status: 108002, req_in_progress=0, loop=3
>         [   13.198920] omap_hsmmc 48060000.mmc: !! mmc0: ISR, irq=34,
> update status: 108002, req_in_progress=0, loop=4
>         [   13.209135] omap_hsmmc 48060000.mmc: !! mmc0: ISR, irq=34,
> update status: 108002, req_in_progress=0, loop=5
>         [   13.219351] omap_hsmmc 48060000.mmc: !! mmc0: ISR, irq=34,
> update status: 108002, req_in_progress=0, loop=6
>         endless looping ..............
>         [   26.754257] omap_hsmmc 48060000.mmc: !! mmc0: ISR, irq=34,
> update status: 108002, req_in_progress=0, loop=1306
>         yes, endless.....
>
>    The source code with line number(to be a reference to the log info)
>  omap_hsmmc.c:
>         847   │ static void
>         848   │ omap_hsmmc_start_command(struct omap_hsmmc_host *host,
> struct mmc_command *cmd,
>         849   │     struct mmc_data *data)
>         850   │ {
>         851   │     int cmdreg = 0, resptype = 0, cmdtype = 0;
>         852   │
>         853   │     dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, argument 0x%08x\n",
>         854   │         mmc_hostname(host->mmc), cmd->opcode, cmd->arg);
>         855   │     host->cmd = cmd;
>         856   │
>         857   │     omap_hsmmc_enable_irq(host, cmd);
>         858   │
>         859   │     dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, argument
> 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, cmd->arg);
>         860   │
>         861   │     host->response_busy = 0;
>         862   │     dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, argument
> 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, cmd->arg);
>         863   │
>         864   │     if (cmd->flags & MMC_RSP_PRESENT) {
>         865   │         if (cmd->flags & MMC_RSP_136){
>         866   │             resptype = 1;
>         867   │             dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d,
> argument 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode,
> cmd->arg);
>         868   │         }else if (cmd->flags & MMC_RSP_BUSY) {
>         869   │             resptype = 3;
>         870   │             host->response_busy = 1;
>         871   │             dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d,
> argument 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode,
> cmd->arg);
>         872   │         } else {
>         873   │             resptype = 2;
>         874   │             dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d,
> argument 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode,
> cmd->arg);
>         875   │         }
>         876   │     }
>         877   │
>         878   │     dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, argument
> 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, cmd->arg);
>         879   │
>         880   │     /*
>         881   │      * Unlike OMAP1 controller, the cmdtype does not
> seem to be based on
>         882   │      * ac, bc, adtc, bcr. Only commands ending an open
> ended transfer need
>         883   │      * a val of 0x3, rest 0x0.
>         884   │      */
>         885   │     if (cmd == host->mrq->stop)
>         886   │         cmdtype = 0x3;
>         887   │
>         888   │     dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, argument
> 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, cmd->arg);
>         889   │
>         890   │     cmdreg = (cmd->opcode << 24) | (resptype << 16) |
> (cmdtype << 22);
>         891   │
>         892   │     if ((host->flags & AUTO_CMD23) &&
> mmc_op_multi(cmd->opcode) &&
>         893   │         host->mrq->sbc) {
>         894   │         cmdreg |= ACEN_ACMD23;
>         895   │
>         896   │         dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d,
> argument 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode,
> cmd->arg);
>         897   │
>         898   │         OMAP_HSMMC_WRITE(host->base, SDMASA,
> host->mrq->sbc->arg);
>         899   │
>         900   │         dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d,
> argument 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode,
> cmd->arg);
>         901   │     }
>         902   │     if (data) {
>         903   │         dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d,
> argument 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode,
> cmd->arg);
>         904   │
>         905   │         cmdreg |= DP_SELECT | MSBS | BCE;
>         906   │         if (data->flags & MMC_DATA_READ)
>         907   │             cmdreg |= DDIR;
>         908   │         else
>         909   │             cmdreg &= ~(DDIR);
>         910   │
>         911   │         dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d,
> argument 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode,
> cmd->arg);
>         912   │     }
>         913   │
>         914   │     if (host->use_dma)
>         915   │         cmdreg |= DMAE;
>         916   │
>         917   │     host->req_in_progress = 1;
>         918   │
>         919   │     dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, argument
> 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, cmd->arg);
>         920   │
>         921   │     OMAP_HSMMC_WRITE(host->base, ARG, cmd->arg);
>         922   │
>         923   │     dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, argument
> 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, cmd->arg);
>         924   │
>         925   │     OMAP_HSMMC_WRITE(host->base, CMD, cmdreg);
>         926   │
>         927   │     if(6 == cmd->opcode){
>         928   │ #define my_dynamic_dev_dbg(dev, fmt, ...)               \
>         929   │         do {                            \
>         930   │             DEFINE_DYNAMIC_DEBUG_METADATA(descriptor, fmt); \
>         931   │             if (DYNAMIC_DEBUG_BRANCH(descriptor)){      \
>         932   │                 __dynamic_dev_dbg(&descriptor, dev, fmt, \
>         933   │                           ##__VA_ARGS__);   \
>         934   │                 dump_stack();               \
>         935   │             }                       \
>         936   │         } while (0)
>         937   │
>         938   │         my_dynamic_dev_dbg(mmc_dev(host->mmc), "%s:
> CMD%d, tracing\n",
>         939   │                    mmc_hostname(host->mmc), cmd->opcode);
>         940   │ #undef my_dynamic_dev_dbg
>         941   │     }
>         942   │ }
>        ................
>        1200   │ static irqreturn_t omap_hsmmc_irq(int irq, void *dev_id)
>        1201   │ {
>        1202   │     struct omap_hsmmc_host *host = dev_id;
>        1203   │     int status;
>        1204   │     unsigned int  loop_count = 0;
>        1205   │
>        1206   │     status = OMAP_HSMMC_READ(host->base, STAT);
>        1207   │     dev_vdbg(mmc_dev(host->mmc), "%s: enter ISR,
> irq=%d, status=%x\n", mmc_hostname(host->mmc), irq, status);
>        1208   │
>        1209   │     while (status & (INT_EN_MASK | CIRQ_EN)) {
>        1210   │         loop_count++;
>        1211   │
>        1212   │         if (host->req_in_progress)
>        1213   │             omap_hsmmc_do_irq(host, status);
>        1214   │
>        1215   │         if (status & CIRQ_EN)
>        1216   │             mmc_signal_sdio_irq(host->mmc);
>        1217   │
>        1218   │         /* Flush posted write */
>        1219   │         status = OMAP_HSMMC_READ(host->base, STAT);
>        1220   │         if(status){
>        1221   │             dev_warn(mmc_dev(host->mmc), "!! %s: ISR,
> irq=%d, update status: %x, req_in_progress=%d, loop=%u\n",
>        1222   │                  mmc_hostname(host->mmc), irq, status,
> host->req_in_progress, loop_count);
>        1223   │         }
>        1224   │     }
>        1225   │
>        1226   │     dev_vdbg(mmc_dev(host->mmc), "%s: leave ISR,
> irq=%d\n", mmc_hostname(host->mmc), irq);
>        1227   │     return IRQ_HANDLED;
>        1228   │ }
>
>
>
>     My patch to fixup that bug is here:
> diff -upr linux-git/drivers/mmc/host/omap_hsmmc.c
> linux-fixup/drivers/mmc/host/omap_hsmmc.c
> --- linux-git/drivers/mmc/host/omap_hsmmc.c     2019-10-29
> 21:33:05.127638851 +0800
> +++ linux-fixup/drivers/mmc/host/omap_hsmmc.c   2019-10-29
> 21:37:44.957637473 +0800
> @@ -852,8 +852,6 @@ omap_hsmmc_start_command(struct omap_hsm
>                 mmc_hostname(host->mmc), cmd->opcode, cmd->arg);
>         host->cmd = cmd;
>
> -       omap_hsmmc_enable_irq(host, cmd);
> -
>         host->response_busy = 0;
>         if (cmd->flags & MMC_RSP_PRESENT) {
>                 if (cmd->flags & MMC_RSP_136)
> @@ -891,7 +889,8 @@ omap_hsmmc_start_command(struct omap_hsm
>         if (host->use_dma)
>                 cmdreg |= DMAE;
>
> -       host->req_in_progress = 1;
> +       host->req_in_progress = 1; /* the flag must be set BEFORE irq enabled */
> +       omap_hsmmc_enable_irq(host, cmd);
>
>         OMAP_HSMMC_WRITE(host->base, ARG, cmd->arg);
>         OMAP_HSMMC_WRITE(host->base, CMD, cmdreg);
>

Thanks for the detailed description to the problem.

The patch above seems like a step in the right direction (I am fine to
apply something along those lines), however, I does look a bit odd
that ->host->req_in_progress isn't protected by a lock. Doesn't it?

In any case, I suggest you to repost this as proper patch with a
changelog. Also, please include some of the folkz from TI who has been
contributing to the driver lately.

Kind regards
Uffe




[Index of Archives]     [Linux Arm (vger)]     [ARM Kernel]     [ARM MSM]     [Linux Tegra]     [Linux WPAN Networking]     [Linux Wireless Networking]     [Maemo Users]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux