Hi Avri, The call stacks look similar but it doesn't appear that my device is suspended / going into suspend. I added tracing to the mmc suspend routines, but did not see any of them fire when I recreated the failure. Thanks, Zak From: Avri Altman <avri.altman@xxxxxxxxx> Sent: Friday, January 11, 2019 6:19:05 AM To: Steven Walter Cc: Zak Hays; linux-mmc@xxxxxxxxxxxxxxx; Bradley Bolen Subject: Re: mmc hung tasks at boot Hi, an issue that might have to do with this one, reported for a ufs platform in scsi LKML - https://www.spinics.net/lists/linux-scsi/msg126868.html. There, they suspected that blk_mq_requeue_work fires while the platform is in system suspend. In ufs, system suspend sends SSU to the device, and just another SSU wakes it up. Can you trace your platform if this is happening while the device has suspended? Thanks, Avri On Fri, Jan 11, 2019 at 12:12 AM Steven Walter <steven.walter@xxxxxxxxxxx> wrote: On Thu, 2019-01-10 at 21:40 +0000, Zak Hays wrote: > Hello all, > > After upgrading to kernel version v4.17, I see hangs one out of every > 200 boots or so. I then see the following hung tasks: > > INFO: task kblockd:30 blocked for more than 120 seconds. > Tainted: P O 4.17.19-yocto-standard- > edf324cbd3b997d05686954a2e8e5d27 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > message. > kblockd D 0 30 2 0x00000000 > Workqueue: kblockd blk_mq_run_work_fn > [<c064382c>] (__schedule) from [<c0643a6c>] (schedule+0xa4/0xd0) > [<c0643a6c>] (schedule) from [<c04ffac0>] > (__mmc_claim_host+0x12c/0x238) > [<c04ffac0>] (__mmc_claim_host) from [<c04ffc04>] > (mmc_get_card+0x38/0x3c) > [<c04ffc04>] (mmc_get_card) from [<c0513e44>] > (mmc_mq_queue_rq+0x104/0x1fc) > [<c0513e44>] (mmc_mq_queue_rq) from [<c02f8378>] > (blk_mq_dispatch_rq_list+0x380/0x4b0) > [<c02f8378>] (blk_mq_dispatch_rq_list) from [<c02fc2cc>] > (blk_mq_do_dispatch_sched+0xf8/0x110) > [<c02fc2cc>] (blk_mq_do_dispatch_sched) from [<c02fca38>] > (blk_mq_sched_dispatch_requests+0x160/0x1d0) > [<c02fca38>] (blk_mq_sched_dispatch_requests) from [<c02f63b4>] > (__blk_mq_run_hw_queue+0x120/0x168) > [<c02f63b4>] (__blk_mq_run_hw_queue) from [<c02f6434>] > (blk_mq_run_work_fn+0x38/0x3c) > [<c02f6434>] (blk_mq_run_work_fn) from [<c0047890>] > (process_one_work+0x288/0x474) > [<c0047890>] (process_one_work) from [<c0047ab4>] > (process_scheduled_works+0x38/0x3c) > [<c0047ab4>] (process_scheduled_works) from [<c00486a8>] > (rescuer_thread+0x1f8/0x35c) > [<c00486a8>] (rescuer_thread) from [<c004d948>] (kthread+0x158/0x174) > [<c004d948>] (kthread) from [<c00090e4>] (ret_from_fork+0x14/0x30) > Exception stack(0xe1dd3fb0 to 0xe1dd3ff8) > 3fa0: 00000000 00000000 00000000 > 00000000 > 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 00000000 > 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > INFO: task kworker/1:1H:91 blocked for more than 120 seconds. > Tainted: P O 4.17.19-yocto-standard- > edf324cbd3b997d05686954a2e8e5d27 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > message. > kworker/1:1H D 0 91 2 0x00000000 > Workqueue: kblockd blk_mq_run_work_fn > [<c064382c>] (__schedule) from [<c0643a6c>] (schedule+0xa4/0xd0) > [<c0643a6c>] (schedule) from [<c04ffac0>] > (__mmc_claim_host+0x12c/0x238) > [<c04ffac0>] (__mmc_claim_host) from [<c04ffc04>] > (mmc_get_card+0x38/0x3c) > [<c04ffc04>] (mmc_get_card) from [<c0513e44>] > (mmc_mq_queue_rq+0x104/0x1fc) > [<c0513e44>] (mmc_mq_queue_rq) from [<c02f8378>] > (blk_mq_dispatch_rq_list+0x380/0x4b0) > [<c02f8378>] (blk_mq_dispatch_rq_list) from [<c02fc2cc>] > (blk_mq_do_dispatch_sched+0xf8/0x110) > [<c02fc2cc>] (blk_mq_do_dispatch_sched) from [<c02fca38>] > (blk_mq_sched_dispatch_requests+0x160/0x1d0) > [<c02fca38>] (blk_mq_sched_dispatch_requests) from [<c02f63b4>] > (__blk_mq_run_hw_queue+0x120/0x168) > [<c02f63b4>] (__blk_mq_run_hw_queue) from [<c02f6434>] > (blk_mq_run_work_fn+0x38/0x3c) > [<c02f6434>] (blk_mq_run_work_fn) from [<c0047890>] > (process_one_work+0x288/0x474) > [<c0047890>] (process_one_work) from [<c0048abc>] > (worker_thread+0x2b0/0x428) > [<c0048abc>] (worker_thread) from [<c004d948>] (kthread+0x158/0x174) > [<c004d948>] (kthread) from [<c00090e4>] (ret_from_fork+0x14/0x30) > Exception stack(0xc19cffb0 to 0xc19cfff8) > ffa0: 00000000 00000000 00000000 > 00000000 > ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 00000000 > ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 > > After bisecting through the commits, I've found the hangs started > after this commit: > > 81196976ed94 Adrian Hunter Wed Nov 29 15:41:03 2017 +0200 mmc: > block: Add blk-mq support > > I'm not sure however what about this particular commit is the source > of the problem. > > It appears like multiple tasks are trying to claim the host but > whatever task is responsible for releasing it isn't getting triggered. > If I dump the blocked tasks, I don't see any other mmc-related tasks > other than the two above. > > Has anyone run into this issue before? If not, does anyone have any > ideas what might be causing the problem? > > Thanks, > Zak Hays In particular, our tracing shows that mmc_blk_mq_req_done is calling kblockd_schedule_work, which should cause mmc_blk_mq_complete_work to run, which will do an mmc_put_card() and unblock the tasks in mmc_get_card(). However, we do not see mmc_blk_mq_complete_work run, possibly because kblockd and kworker/1:1H are blocked in mmc_get_card() -- Steven Walter <steven.walter@xxxxxxxxxxx> -- http://www.sigbee.com/signatures/8f599119d5579f88613d9ed76b4e4be089057c3e/5834cb998768feb9.gif