Re: mmc hung tasks at boot

Steven Walter <steven.walter@xxxxxxxxxxx> · Thu, 10 Jan 2019 17:10:25 -0500



On Thu, 2019-01-10 at 21:40 +0000, Zak Hays wrote:
> Hello all,
> 
> After upgrading to kernel version v4.17, I see hangs one out of every
> 200 boots or so. I then see the following hung tasks:
> 
> INFO: task kblockd:30 blocked for more than 120 seconds.
>       Tainted: P           O      4.17.19-yocto-standard-
> edf324cbd3b997d05686954a2e8e5d27 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> kblockd         D    0    30      2 0x00000000
> Workqueue: kblockd blk_mq_run_work_fn
> [<c064382c>] (__schedule) from [<c0643a6c>] (schedule+0xa4/0xd0)
> [<c0643a6c>] (schedule) from [<c04ffac0>]
> (__mmc_claim_host+0x12c/0x238)
> [<c04ffac0>] (__mmc_claim_host) from [<c04ffc04>]
> (mmc_get_card+0x38/0x3c)
> [<c04ffc04>] (mmc_get_card) from [<c0513e44>]
> (mmc_mq_queue_rq+0x104/0x1fc)
> [<c0513e44>] (mmc_mq_queue_rq) from [<c02f8378>]
> (blk_mq_dispatch_rq_list+0x380/0x4b0)
> [<c02f8378>] (blk_mq_dispatch_rq_list) from [<c02fc2cc>]
> (blk_mq_do_dispatch_sched+0xf8/0x110)
> [<c02fc2cc>] (blk_mq_do_dispatch_sched) from [<c02fca38>]
> (blk_mq_sched_dispatch_requests+0x160/0x1d0)
> [<c02fca38>] (blk_mq_sched_dispatch_requests) from [<c02f63b4>]
> (__blk_mq_run_hw_queue+0x120/0x168)
> [<c02f63b4>] (__blk_mq_run_hw_queue) from [<c02f6434>]
> (blk_mq_run_work_fn+0x38/0x3c)
> [<c02f6434>] (blk_mq_run_work_fn) from [<c0047890>]
> (process_one_work+0x288/0x474)
> [<c0047890>] (process_one_work) from [<c0047ab4>]
> (process_scheduled_works+0x38/0x3c)
> [<c0047ab4>] (process_scheduled_works) from [<c00486a8>]
> (rescuer_thread+0x1f8/0x35c)
> [<c00486a8>] (rescuer_thread) from [<c004d948>] (kthread+0x158/0x174)
> [<c004d948>] (kthread) from [<c00090e4>] (ret_from_fork+0x14/0x30)
> Exception stack(0xe1dd3fb0 to 0xe1dd3ff8)
> 3fa0:                                     00000000 00000000 00000000
> 00000000
> 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000
> 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> INFO: task kworker/1:1H:91 blocked for more than 120 seconds.
>       Tainted: P           O      4.17.19-yocto-standard-
> edf324cbd3b997d05686954a2e8e5d27 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> kworker/1:1H    D    0    91      2 0x00000000
> Workqueue: kblockd blk_mq_run_work_fn
> [<c064382c>] (__schedule) from [<c0643a6c>] (schedule+0xa4/0xd0)
> [<c0643a6c>] (schedule) from [<c04ffac0>]
> (__mmc_claim_host+0x12c/0x238)
> [<c04ffac0>] (__mmc_claim_host) from [<c04ffc04>]
> (mmc_get_card+0x38/0x3c)
> [<c04ffc04>] (mmc_get_card) from [<c0513e44>]
> (mmc_mq_queue_rq+0x104/0x1fc)
> [<c0513e44>] (mmc_mq_queue_rq) from [<c02f8378>]
> (blk_mq_dispatch_rq_list+0x380/0x4b0)
> [<c02f8378>] (blk_mq_dispatch_rq_list) from [<c02fc2cc>]
> (blk_mq_do_dispatch_sched+0xf8/0x110)
> [<c02fc2cc>] (blk_mq_do_dispatch_sched) from [<c02fca38>]
> (blk_mq_sched_dispatch_requests+0x160/0x1d0)
> [<c02fca38>] (blk_mq_sched_dispatch_requests) from [<c02f63b4>]
> (__blk_mq_run_hw_queue+0x120/0x168)
> [<c02f63b4>] (__blk_mq_run_hw_queue) from [<c02f6434>]
> (blk_mq_run_work_fn+0x38/0x3c)
> [<c02f6434>] (blk_mq_run_work_fn) from [<c0047890>]
> (process_one_work+0x288/0x474)
> [<c0047890>] (process_one_work) from [<c0048abc>]
> (worker_thread+0x2b0/0x428)
> [<c0048abc>] (worker_thread) from [<c004d948>] (kthread+0x158/0x174)
> [<c004d948>] (kthread) from [<c00090e4>] (ret_from_fork+0x14/0x30)
> Exception stack(0xc19cffb0 to 0xc19cfff8)
> ffa0:                                     00000000 00000000 00000000
> 00000000
> ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000
> ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
> 
> After bisecting through the commits, I've found the hangs started
> after this commit:
> 
> 81196976ed94    Adrian Hunter   Wed Nov 29 15:41:03 2017 +0200  mmc:
> block: Add blk-mq support
> 
> I'm not sure however what about this particular commit is the source
> of the problem.
> 
> It appears like multiple tasks are trying to claim the host but
> whatever task is responsible for releasing it isn't getting triggered.
> If I dump the blocked tasks, I don't see any other mmc-related tasks
> other than the two above.
> 
> Has anyone run into this issue before? If not, does anyone have any
> ideas what might be causing the problem?
> 
> Thanks,
> Zak Hays

In particular, our tracing shows that mmc_blk_mq_req_done is calling
kblockd_schedule_work, which should cause mmc_blk_mq_complete_work to
run, which will do an mmc_put_card() and unblock the tasks in
mmc_get_card().  However, we do not see mmc_blk_mq_complete_work run,
possibly because kblockd and kworker/1:1H are blocked in mmc_get_card()
-- 
Steven Walter <steven.walter@xxxxxxxxxxx>