My response below didn't make it to the list for some reason, resending... Hello all, After adding some tracing, it appears the hung tasks are due to a dead lock in the kblockd workqueue's rescuer thread. In a normal mmc operation, the code flows as follows: <...>-57 [000] d..1 2.867001: __mmc_claim_host: comm=kworker/0:1H udevd-65 [001] ..s. 2.867972: mmc_blk_mq_req_done: comm=udevd <= schedules complete_work on kblockd workqueue udevd-65 [001] d.s1 2.867980: workqueue_queue_work: work struct=cdd588be function=mmc_blk_mq_complete_work workqueue=6ae417d9 req_cpu=4 cpu=1 udevd-65 [001] d.s1 2.867982: workqueue_activate_work: work struct cdd588be <...>-93 [001] .... 2.868025: workqueue_execute_start: work struct cdd588be: function mmc_blk_mq_complete_work <...>-93 [001] .... 2.868033: mmc_blk_mq_complete_work: comm=kworker/1:2H <...>-93 [001] .... 2.868070: mmc_release_host: comm=kworker/1:2H In a typical failure case, the following scenario occurs: <...>-57 [000] .... 2.868122: __mmc_claim_host: comm=kworker/0:1H <= host is claimed <...>-4 [000] d..1 2.880279: __mmc_claim_host: comm=kworker/0:0H <= another task attempts to claim host <...>-57 [000] ..s1 2.891434: mmc_blk_mq_req_done: comm=kworker/0:1H <= schedules complete_work <...>-57 [000] d.s2 2.891442: workqueue_queue_work: work struct=55f8575e function=mmc_blk_mq_complete_work workqueue=6ae417d9 req_cpu=4 cpu=0 <...>-57 [000] d.s2 2.891444: workqueue_activate_work: work struct 55f8575e **** workqueue_execute_start and mmc_blk_mq_complete_work are never called so host is never released! **** <...>-29 [000] d..1 2.901553: __mmc_claim_host: comm=kblockd <= another attempt to claim host, also the last attempt in the trace After digging further into the kernel workqueue, it was found that because the kblockd workqueue is getting set as WQ_MEM_RECLAIM a rescuer_thread is created for kblockd. In cases where a new kworker would need to be created to complete the work items, a mayday timeout is sometimes getting triggered when the CPUs are under heavy load. When the rescuer thread runs, it steals all the outstanding work and runs it serially in the order of creation: <...>-94 [000] .... 2.880300: maybe_create_worker comm=kworker/0:2H <= starts mayday timeout **** mayday timeout triggers, loops through the worklist which has the following 2 tasks: **** udevd-81 [000] d.s2 2.901384: pool_mayday_timeout: comm=udevd work=e07e7440 work->func=blk_mq_run_work_fn <= the work trying to claim the host udevd-81 [000] d.s2 2.901407: pool_mayday_timeout: comm=udevd work=e07a90f4 work->func=mmc_blk_mq_complete_work <= the work trying to release the host **** send_mayday is called which wakes up the rescuer_thread **** **** the rescuer thread then loops through and collects all the work on the kblockd workqueue **** <...>-29 [000] d..1 2.901517: rescuer_thread: comm=kblockd pid=29 work=e07e7440 work->func=blk_mq_run_work_fn <...>-29 [000] d..1 2.901521: rescuer_thread: comm=kblockd pid=29 work=e07a90f4 work->func=mmc_blk_mq_complete_work The kblockd rescuer thread then attempts to complete the work in order, starting with blk_mq_run_work_fn. This is seen above in the __mmc_claim_host call at 2.901553 which is repeating the attempt made at 2.880279 by kworker/0:0H. This task will not be able to claim the host though until after mmc_blk_mq_complete_work is called and releases the host. The task to call mc_blk_mq_complete_work is after the claim_host task so a deadlock has now been created. Two workarounds have been verified that prevent running on work queues that have a rescuer thread: 1) Don't alloc the kblockd workqueue as WQ_MEM_RECLAIM: --- a/block/blk-core.c +++ b/block/blk-core.c @@ -3810,7 +3810,7 @@ int __init blk_dev_init(void) /* used for unplugging and affects IO latency/throughput - HIGHPRI */ kblockd_workqueue = alloc_workqueue("kblockd", - WQ_MEM_RECLAIM | WQ_HIGHPRI, 0); + WQ_HIGHPRI, 0); if (!kblockd_workqueue) panic("Failed to create kblockd\n"); This bypasses the mayday timeout and rescuer thread and allows a new kworker to handle mmc_blk_mq_complete_work. 2) Don't queue complete_work on the kblockd workqueue: --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -2106,7 +2106,7 @@ static void mmc_blk_mq_req_done(struct mmc_request *mrq) if (waiting) wake_up(&mq->wait); else - kblockd_schedule_work(&mq->complete_work); + queue_work(system_highpri_wq, &mq-complete_work); return; This prevents the possible deadlock in the rescuer_thread by keeping completion on a separate workqueue. Both of these options appear to resolve the hung tasks. Is one of these the preferred fix? Is it appropriate to change the workqueue client code in this way or is the expectation that the workqueues will run concurrently valid and the serial behavior of the rescuer thread is what really needs to be addressed? Thanks, Zak Hays TL;DR: The hung tasks are due to a deadlock in the kblockd rescuer thread. Two possible workarounds have been verified: don't declare kblockd workqueue with WQ_MEM_RECLAIM set or handle the mmc complete_work on a different workqueue. Is one of these options the preferred fix or should the issue actually be resolved in the rescuer thread? From: linux-mmc-owner@xxxxxxxxxxxxxxx <linux-mmc-owner@xxxxxxxxxxxxxxx> on behalf of Zak Hays <zak.hays@xxxxxxxxxxx> Sent: Friday, January 11, 2019 5:38:42 PM To: Avri Altman; Steven Walter Cc: linux-mmc@xxxxxxxxxxxxxxx; Bradley Bolen Subject: Re: mmc hung tasks at boot Hi Avri, The call stacks look similar but it doesn't appear that my device is suspended / going into suspend. I added tracing to the mmc suspend routines, but did not see any of them fire when I recreated the failure. Thanks, Zak From: Avri Altman <avri.altman@xxxxxxxxx> Sent: Friday, January 11, 2019 6:19:05 AM To: Steven Walter Cc: Zak Hays; linux-mmc@xxxxxxxxxxxxxxx; Bradley Bolen Subject: Re: mmc hung tasks at boot Hi, an issue that might have to do with this one, reported for a ufs platform in scsi LKML - https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Flinux-scsi%2Fmsg126868.html&data=02%7C01%7Czak.hays%40lexmark.com%7C1e0efc74a87b4f4a5e8908d678158e5d%7C127090656e6c41c99e4dfb0a436969ce%7C1%7C0%7C636828431328146659&sdata=euVUVytYTXEI%2B4f3byJflrjxrzuRJt3QzpLHgbcT0dM%3D&reserved=0. There, they suspected that blk_mq_requeue_work fires while the platform is in system suspend. In ufs, system suspend sends SSU to the device, and just another SSU wakes it up. Can you trace your platform if this is happening while the device has suspended? Thanks, Avri On Fri, Jan 11, 2019 at 12:12 AM Steven Walter <steven.walter@xxxxxxxxxxx> wrote: On Thu, 2019-01-10 at 21:40 +0000, Zak Hays wrote: > Hello all, > > After upgrading to kernel version v4.17, I see hangs one out of every > 200 boots or so. I then see the following hung tasks: > > INFO: task kblockd:30 blocked for more than 120 seconds. > Tainted: P O 4.17.19-yocto-standard- > edf324cbd3b997d05686954a2e8e5d27 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > message. > kblockd D 0 30 2 0x00000000 > Workqueue: kblockd blk_mq_run_work_fn > [<c064382c>] (__schedule) from [<c0643a6c>] (schedule+0xa4/0xd0) > [<c0643a6c>] (schedule) from [<c04ffac0>] > (__mmc_claim_host+0x12c/0x238) > [<c04ffac0>] (__mmc_claim_host) from [<c04ffc04>] > (mmc_get_card+0x38/0x3c) > [<c04ffc04>] (mmc_get_card) from [<c0513e44>] > (mmc_mq_queue_rq+0x104/0x1fc) > [<c0513e44>] (mmc_mq_queue_rq) from [<c02f8378>] > (blk_mq_dispatch_rq_list+0x380/0x4b0) > [<c02f8378>] (blk_mq_dispatch_rq_list) from [<c02fc2cc>] > (blk_mq_do_dispatch_sched+0xf8/0x110) > [<c02fc2cc>] (blk_mq_do_dispatch_sched) from [<c02fca38>] > (blk_mq_sched_dispatch_requests+0x160/0x1d0) > [<c02fca38>] (blk_mq_sched_dispatch_requests) from [<c02f63b4>] > (__blk_mq_run_hw_queue+0x120/0x168) > [<c02f63b4>] (__blk_mq_run_hw_queue) from [<c02f6434>] > (blk_mq_run_work_fn+0x38/0x3c) > [<c02f6434>] (blk_mq_run_work_fn) from [<c0047890>] > (process_one_work+0x288/0x474) > [<c0047890>] (process_one_work) from [<c0047ab4>] > (process_scheduled_works+0x38/0x3c) > [<c0047ab4>] (process_scheduled_works) from [<c00486a8>] > (rescuer_thread+0x1f8/0x35c) > [<c00486a8>] (rescuer_thread) from [<c004d948>] (kthread+0x158/0x174) > [<c004d948>] (kthread) from [<c00090e4>] (ret_from_fork+0x14/0x30) > Exception stack(0xe1dd3fb0 to 0xe1dd3ff8) > 3fa0: 00000000 00000000 00000000 > 00000000 > 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 00000000 > 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > INFO: task kworker/1:1H:91 blocked for more than 120 seconds. > Tainted: P O 4.17.19-yocto-standard- > edf324cbd3b997d05686954a2e8e5d27 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this > message. > kworker/1:1H D 0 91 2 0x00000000 > Workqueue: kblockd blk_mq_run_work_fn > [<c064382c>] (__schedule) from [<c0643a6c>] (schedule+0xa4/0xd0) > [<c0643a6c>] (schedule) from [<c04ffac0>] > (__mmc_claim_host+0x12c/0x238) > [<c04ffac0>] (__mmc_claim_host) from [<c04ffc04>] > (mmc_get_card+0x38/0x3c) > [<c04ffc04>] (mmc_get_card) from [<c0513e44>] > (mmc_mq_queue_rq+0x104/0x1fc) > [<c0513e44>] (mmc_mq_queue_rq) from [<c02f8378>] > (blk_mq_dispatch_rq_list+0x380/0x4b0) > [<c02f8378>] (blk_mq_dispatch_rq_list) from [<c02fc2cc>] > (blk_mq_do_dispatch_sched+0xf8/0x110) > [<c02fc2cc>] (blk_mq_do_dispatch_sched) from [<c02fca38>] > (blk_mq_sched_dispatch_requests+0x160/0x1d0) > [<c02fca38>] (blk_mq_sched_dispatch_requests) from [<c02f63b4>] > (__blk_mq_run_hw_queue+0x120/0x168) > [<c02f63b4>] (__blk_mq_run_hw_queue) from [<c02f6434>] > (blk_mq_run_work_fn+0x38/0x3c) > [<c02f6434>] (blk_mq_run_work_fn) from [<c0047890>] > (process_one_work+0x288/0x474) > [<c0047890>] (process_one_work) from [<c0048abc>] > (worker_thread+0x2b0/0x428) > [<c0048abc>] (worker_thread) from [<c004d948>] (kthread+0x158/0x174) > [<c004d948>] (kthread) from [<c00090e4>] (ret_from_fork+0x14/0x30) > Exception stack(0xc19cffb0 to 0xc19cfff8) > ffa0: 00000000 00000000 00000000 > 00000000 > ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 00000000 > ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 > > After bisecting through the commits, I've found the hangs started > after this commit: > > 81196976ed94 Adrian Hunter Wed Nov 29 15:41:03 2017 +0200 mmc: > block: Add blk-mq support > > I'm not sure however what about this particular commit is the source > of the problem. > > It appears like multiple tasks are trying to claim the host but > whatever task is responsible for releasing it isn't getting triggered. > If I dump the blocked tasks, I don't see any other mmc-related tasks > other than the two above. > > Has anyone run into this issue before? If not, does anyone have any > ideas what might be causing the problem? > > Thanks, > Zak Hays In particular, our tracing shows that mmc_blk_mq_req_done is calling kblockd_schedule_work, which should cause mmc_blk_mq_complete_work to run, which will do an mmc_put_card() and unblock the tasks in mmc_get_card(). However, we do not see mmc_blk_mq_complete_work run, possibly because kblockd and kworker/1:1H are blocked in mmc_get_card() -- Steven Walter <steven.walter@xxxxxxxxxxx> -- https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.sigbee.com%2Fsignatures%2F8f599119d5579f88613d9ed76b4e4be089057c3e%2F5834cb998768feb9.gif&data=02%7C01%7Czak.hays%40lexmark.com%7C1e0efc74a87b4f4a5e8908d678158e5d%7C127090656e6c41c99e4dfb0a436969ce%7C1%7C0%7C636828431328146659&sdata=mCEU0TFq%2FZzoUUKiSzVsf4KnFKDhUoUviPIsEXp8wVA%3D&reserved=0 ;