On Wed, 30 Oct 2019 at 01:05, Yi Zheng <goodmenzy@xxxxxxxxx> wrote: > > Hi, > > I found that there may be a dead loop in omap-hsmmc driver: > drivers/mmc/host/omap_hsmmc.c > > My testing is on v4.19.58-0-g7a6bfa08b938, but the latest > kernel(v5.4-rc5-19-g8005803a2ca0) has the same bug(I guess). > > In omap_hsmmc_start_command(), the IRQ is enabled too early: > omap_hsmmc_start_command(struct omap_hsmmc_host *host, struct > mmc_command *cmd, > struct mmc_data *data) > { > /* ..................... */ > omap_hsmmc_enable_irq(host, cmd); > > /* ..................... */ > /* If IRQ occurs here, we run in deadloop! */ > > host->req_in_progress = 1; > > /* ..................... */ > } > > In OMAP-HSMMC ISR, the C code is: > static irqreturn_t omap_hsmmc_irq(int irq, void *dev_id) > { > struct omap_hsmmc_host *host = dev_id; > int status; > > status = OMAP_HSMMC_READ(host->base, STAT); > while (status & (INT_EN_MASK | CIRQ_EN)) { > if (host->req_in_progress) > omap_hsmmc_do_irq(host, status); > > if (status & CIRQ_EN) > mmc_signal_sdio_irq(host->mmc); > > /* Flush posted write */ > status = OMAP_HSMMC_READ(host->base, STAT); > } > > return IRQ_HANDLED; > } > > It checks the flag host->req_in_progress, only it is set, > omap_hsmmc_do_irq() will be called. The STAT register will be > written in omap_hsmmc_do_irq(), that will cause the interrupts to > be reset. > > If MMC IRQ occurs in that danger time-window, my single core CPU > will run into endless loop in the ISR omap_hsmmc_irq(), due to > STAT register is never written to clean the IRQ. > > Due to the small time-window, this bug is some what difficult to > be found. I add some debug info dumpping in that two functions, > that will cause the code slower, the bug will occur easier. It is > realy a bug. The debug version of the source and the debug log is > some what tedious. I only add the snippet here: > > The bug log: > [ 12.765560] omap_hsmmc_runtime_resume:2354: omap_hsmmc > 48060000.mmc: enabled > [ 12.773006] mmc_mrq_pr_debug:279: mmc0: starting CMD6 arg > 03200101 flags 0000049d > [ 12.780941] omap_hsmmc_start_command:853: omap_hsmmc > 48060000.mmc: mmc0: CMD6, argument 0x03200101 > [ 12.790396] omap_hsmmc_start_command:859: omap_hsmmc > 48060000.mmc: mmc0: CMD6, argument 0x03200101, tracing > [ 12.800662] omap_hsmmc_start_command:862: omap_hsmmc > 48060000.mmc: mmc0: CMD6, argument 0x03200101, tracing > [ 12.811054] omap_hsmmc_start_command:871: omap_hsmmc > 48060000.mmc: mmc0: CMD6, argument 0x03200101, tracing > [ 12.821331] omap_hsmmc_start_command:878: omap_hsmmc > 48060000.mmc: mmc0: CMD6, argument 0x03200101, tracing > [ 12.831596] omap_hsmmc_start_command:888: omap_hsmmc > 48060000.mmc: mmc0: CMD6, argument 0x03200101, tracing > [ 12.841861] omap_hsmmc_start_command:919: omap_hsmmc > 48060000.mmc: mmc0: CMD6, argument 0x03200101, tracing > [ 12.852124] omap_hsmmc_start_command:923: omap_hsmmc > 48060000.mmc: mmc0: CMD6, argument 0x03200101, tracing > [ 12.862393] omap_hsmmc_irq:1207: omap_hsmmc 48060000.mmc: > mmc0: enter ISR, irq=34, status=1 > [ 12.871155] omap_hsmmc_do_irq:1157: omap_hsmmc > 48060000.mmc: IRQ Status is 1 > [ 12.878553] omap_hsmmc_irq:1226: omap_hsmmc 48060000.mmc: > mmc0: leave ISR, irq=34 > [ 12.886450] omap_hsmmc_start_command:939: omap_hsmmc > 48060000.mmc: mmc0: CMD6, tracing > [ 12.894805] CPU: 0 PID: 95 Comm: kworker/0:1H Tainted: G > W O 4.19.58-0-g7a6bfa08b-cp5502 #25 > [ 12.904878] Hardware name: Generic AM33XX (Flattened Device Tree) > [ 12.911292] Workqueue: kblockd blk_mq_run_work_fn > [ 12.916316] [<c010f7fc>] (unwind_backtrace) from > [<c010ce4c>] (show_stack+0x18/0x1c) > [ 12.924506] [<c010ce4c>] (show_stack) from [<c0629cac>] > (omap_hsmmc_request+0x2b0/0x4b0) > [ 12.933014] [<c0629cac>] (omap_hsmmc_request) from > [<c06147b8>] (mmc_start_request+0xa4/0xb8) > [ 12.942015] [<c06147b8>] (mmc_start_request) from > [<c0614834>] (mmc_wait_for_req+0x68/0xd0) > [ 12.950828] [<c0614834>] (mmc_wait_for_req) from > [<c0614918>] (mmc_wait_for_cmd+0x7c/0xa4) > [ 12.959552] [<c0614918>] (mmc_wait_for_cmd) from > [<c061bbe0>] (__mmc_switch+0xe0/0x2cc) > [ 12.967998] [<c061bbe0>] (__mmc_switch) from [<c061bdf8>] > (mmc_switch+0x2c/0x34) > [ 12.975805] [<c061bdf8>] (mmc_switch) from [<c061be58>] > (mmc_flush_cache+0x58/0x84) > [ 12.983844] [<c061be58>] (mmc_flush_cache) from > [<c0626f84>] (mmc_blk_mq_issue_rq+0x5d8/0x76c) > [ 12.992927] [<c0626f84>] (mmc_blk_mq_issue_rq) from > [<c0627644>] (mmc_mq_queue_rq+0x158/0x1fc) > [ 13.002056] [<c0627644>] (mmc_mq_queue_rq) from > [<c0470590>] (blk_mq_dispatch_rq_list+0x384/0x4c4) > [ 13.011521] [<c0470590>] (blk_mq_dispatch_rq_list) from > [<c047531c>] (blk_mq_sched_dispatch_requests+0xfc/0x174) > [ 13.022248] [<c047531c>] (blk_mq_sched_dispatch_requests) > from [<c046f020>] (__blk_mq_run_hw_queue+0xcc/0x110) > [ 13.032793] [<c046f020>] (__blk_mq_run_hw_queue) from > [<c01520f0>] (process_one_work+0x364/0x64c) > [ 13.042142] omap_hsmmc_irq:1207: omap_hsmmc 48060000.mmc: > mmc0: enter ISR, irq=34, status=108000 > [ 13.042156] omap_hsmmc_do_irq:1157: omap_hsmmc > 48060000.mmc: IRQ Status is 108000 > [ 13.042173] omap_hsmmc_dbg_report_irq:1088: omap_hsmmc > 48060000.mmc: MMC IRQ 0x108000 : ERRI DTO > [ 13.042196] omap_hsmmc_irq:1226: omap_hsmmc 48060000.mmc: > mmc0: leave ISR, irq=34 > [ 13.076452] [<c01520f0>] (process_one_work) from > [<c0153020>] (worker_thread+0x294/0x408) > [ 13.085089] [<c0153020>] (worker_thread) from [<c0158624>] > (kthread+0x140/0x15c) > [ 13.092861] [<c0158624>] (kthread) from [<c01010b4>] > (ret_from_fork+0x14/0x20) > [ 13.100482] Exception stack(0xdd179fb0 to 0xdd179ff8) > [ 13.105828] 9fa0: > 00000000 00000000 00000000 00000000 > [ 13.114411] 9fc0: 00000000 00000000 00000000 00000000 > 00000000 00000000 00000000 00000000 > [ 13.123032] 9fe0: 00000000 00000000 00000000 00000000 > 00000013 00000000 > [ 13.130027] mmc_wait_for_req_done:433: mmc0: req failed > (CMD6): -110, cmd->retries=3, retrying... > [ 13.139390] omap_hsmmc_start_command:853: omap_hsmmc > 48060000.mmc: mmc0: CMD6, argument 0x03200101 > [ 13.148839] omap_hsmmc_start_command:859: omap_hsmmc > 48060000.mmc: mmc0: CMD6, argument 0x03200101, tracing > [ 13.148855] omap_hsmmc_irq:1207: omap_hsmmc 48060000.mmc: > mmc0: enter ISR, irq=34, status=108000 > [ 13.168274] omap_hsmmc 48060000.mmc: !! mmc0: ISR, irq=34, > update status: 108000, req_in_progress=0, loop=1 > [ 13.178489] omap_hsmmc 48060000.mmc: !! mmc0: ISR, irq=34, > update status: 108000, req_in_progress=0, loop=2 > [ 13.188704] omap_hsmmc 48060000.mmc: !! mmc0: ISR, irq=34, > update status: 108002, req_in_progress=0, loop=3 > [ 13.198920] omap_hsmmc 48060000.mmc: !! mmc0: ISR, irq=34, > update status: 108002, req_in_progress=0, loop=4 > [ 13.209135] omap_hsmmc 48060000.mmc: !! mmc0: ISR, irq=34, > update status: 108002, req_in_progress=0, loop=5 > [ 13.219351] omap_hsmmc 48060000.mmc: !! mmc0: ISR, irq=34, > update status: 108002, req_in_progress=0, loop=6 > endless looping .............. > [ 26.754257] omap_hsmmc 48060000.mmc: !! mmc0: ISR, irq=34, > update status: 108002, req_in_progress=0, loop=1306 > yes, endless..... > > The source code with line number(to be a reference to the log info) > omap_hsmmc.c: > 847 │ static void > 848 │ omap_hsmmc_start_command(struct omap_hsmmc_host *host, > struct mmc_command *cmd, > 849 │ struct mmc_data *data) > 850 │ { > 851 │ int cmdreg = 0, resptype = 0, cmdtype = 0; > 852 │ > 853 │ dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, argument 0x%08x\n", > 854 │ mmc_hostname(host->mmc), cmd->opcode, cmd->arg); > 855 │ host->cmd = cmd; > 856 │ > 857 │ omap_hsmmc_enable_irq(host, cmd); > 858 │ > 859 │ dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, argument > 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, cmd->arg); > 860 │ > 861 │ host->response_busy = 0; > 862 │ dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, argument > 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, cmd->arg); > 863 │ > 864 │ if (cmd->flags & MMC_RSP_PRESENT) { > 865 │ if (cmd->flags & MMC_RSP_136){ > 866 │ resptype = 1; > 867 │ dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, > argument 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, > cmd->arg); > 868 │ }else if (cmd->flags & MMC_RSP_BUSY) { > 869 │ resptype = 3; > 870 │ host->response_busy = 1; > 871 │ dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, > argument 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, > cmd->arg); > 872 │ } else { > 873 │ resptype = 2; > 874 │ dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, > argument 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, > cmd->arg); > 875 │ } > 876 │ } > 877 │ > 878 │ dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, argument > 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, cmd->arg); > 879 │ > 880 │ /* > 881 │ * Unlike OMAP1 controller, the cmdtype does not > seem to be based on > 882 │ * ac, bc, adtc, bcr. Only commands ending an open > ended transfer need > 883 │ * a val of 0x3, rest 0x0. > 884 │ */ > 885 │ if (cmd == host->mrq->stop) > 886 │ cmdtype = 0x3; > 887 │ > 888 │ dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, argument > 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, cmd->arg); > 889 │ > 890 │ cmdreg = (cmd->opcode << 24) | (resptype << 16) | > (cmdtype << 22); > 891 │ > 892 │ if ((host->flags & AUTO_CMD23) && > mmc_op_multi(cmd->opcode) && > 893 │ host->mrq->sbc) { > 894 │ cmdreg |= ACEN_ACMD23; > 895 │ > 896 │ dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, > argument 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, > cmd->arg); > 897 │ > 898 │ OMAP_HSMMC_WRITE(host->base, SDMASA, > host->mrq->sbc->arg); > 899 │ > 900 │ dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, > argument 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, > cmd->arg); > 901 │ } > 902 │ if (data) { > 903 │ dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, > argument 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, > cmd->arg); > 904 │ > 905 │ cmdreg |= DP_SELECT | MSBS | BCE; > 906 │ if (data->flags & MMC_DATA_READ) > 907 │ cmdreg |= DDIR; > 908 │ else > 909 │ cmdreg &= ~(DDIR); > 910 │ > 911 │ dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, > argument 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, > cmd->arg); > 912 │ } > 913 │ > 914 │ if (host->use_dma) > 915 │ cmdreg |= DMAE; > 916 │ > 917 │ host->req_in_progress = 1; > 918 │ > 919 │ dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, argument > 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, cmd->arg); > 920 │ > 921 │ OMAP_HSMMC_WRITE(host->base, ARG, cmd->arg); > 922 │ > 923 │ dev_vdbg(mmc_dev(host->mmc), "%s: CMD%d, argument > 0x%08x, tracing\n", mmc_hostname(host->mmc), cmd->opcode, cmd->arg); > 924 │ > 925 │ OMAP_HSMMC_WRITE(host->base, CMD, cmdreg); > 926 │ > 927 │ if(6 == cmd->opcode){ > 928 │ #define my_dynamic_dev_dbg(dev, fmt, ...) \ > 929 │ do { \ > 930 │ DEFINE_DYNAMIC_DEBUG_METADATA(descriptor, fmt); \ > 931 │ if (DYNAMIC_DEBUG_BRANCH(descriptor)){ \ > 932 │ __dynamic_dev_dbg(&descriptor, dev, fmt, \ > 933 │ ##__VA_ARGS__); \ > 934 │ dump_stack(); \ > 935 │ } \ > 936 │ } while (0) > 937 │ > 938 │ my_dynamic_dev_dbg(mmc_dev(host->mmc), "%s: > CMD%d, tracing\n", > 939 │ mmc_hostname(host->mmc), cmd->opcode); > 940 │ #undef my_dynamic_dev_dbg > 941 │ } > 942 │ } > ................ > 1200 │ static irqreturn_t omap_hsmmc_irq(int irq, void *dev_id) > 1201 │ { > 1202 │ struct omap_hsmmc_host *host = dev_id; > 1203 │ int status; > 1204 │ unsigned int loop_count = 0; > 1205 │ > 1206 │ status = OMAP_HSMMC_READ(host->base, STAT); > 1207 │ dev_vdbg(mmc_dev(host->mmc), "%s: enter ISR, > irq=%d, status=%x\n", mmc_hostname(host->mmc), irq, status); > 1208 │ > 1209 │ while (status & (INT_EN_MASK | CIRQ_EN)) { > 1210 │ loop_count++; > 1211 │ > 1212 │ if (host->req_in_progress) > 1213 │ omap_hsmmc_do_irq(host, status); > 1214 │ > 1215 │ if (status & CIRQ_EN) > 1216 │ mmc_signal_sdio_irq(host->mmc); > 1217 │ > 1218 │ /* Flush posted write */ > 1219 │ status = OMAP_HSMMC_READ(host->base, STAT); > 1220 │ if(status){ > 1221 │ dev_warn(mmc_dev(host->mmc), "!! %s: ISR, > irq=%d, update status: %x, req_in_progress=%d, loop=%u\n", > 1222 │ mmc_hostname(host->mmc), irq, status, > host->req_in_progress, loop_count); > 1223 │ } > 1224 │ } > 1225 │ > 1226 │ dev_vdbg(mmc_dev(host->mmc), "%s: leave ISR, > irq=%d\n", mmc_hostname(host->mmc), irq); > 1227 │ return IRQ_HANDLED; > 1228 │ } > > > > My patch to fixup that bug is here: > diff -upr linux-git/drivers/mmc/host/omap_hsmmc.c > linux-fixup/drivers/mmc/host/omap_hsmmc.c > --- linux-git/drivers/mmc/host/omap_hsmmc.c 2019-10-29 > 21:33:05.127638851 +0800 > +++ linux-fixup/drivers/mmc/host/omap_hsmmc.c 2019-10-29 > 21:37:44.957637473 +0800 > @@ -852,8 +852,6 @@ omap_hsmmc_start_command(struct omap_hsm > mmc_hostname(host->mmc), cmd->opcode, cmd->arg); > host->cmd = cmd; > > - omap_hsmmc_enable_irq(host, cmd); > - > host->response_busy = 0; > if (cmd->flags & MMC_RSP_PRESENT) { > if (cmd->flags & MMC_RSP_136) > @@ -891,7 +889,8 @@ omap_hsmmc_start_command(struct omap_hsm > if (host->use_dma) > cmdreg |= DMAE; > > - host->req_in_progress = 1; > + host->req_in_progress = 1; /* the flag must be set BEFORE irq enabled */ > + omap_hsmmc_enable_irq(host, cmd); > > OMAP_HSMMC_WRITE(host->base, ARG, cmd->arg); > OMAP_HSMMC_WRITE(host->base, CMD, cmdreg); > Thanks for the detailed description to the problem. The patch above seems like a step in the right direction (I am fine to apply something along those lines), however, I does look a bit odd that ->host->req_in_progress isn't protected by a lock. Doesn't it? In any case, I suggest you to repost this as proper patch with a changelog. Also, please include some of the folkz from TI who has been contributing to the driver lately. Kind regards Uffe