On 4/02/19 8:24 AM, Ulf Hansson wrote: > + Jens, Christoph, Adrian, Linus > > On Thu, 31 Jan 2019 at 21:16, Zachary Hays <zhays@xxxxxxxxxxx> wrote: >> >> The kblockd workqueue is created with the WQ_MEM_RECLAIM flag set. >> This generates a rescuer thread for that queue that will trigger when >> the CPU is under heavy load and collect the uncompleted work. >> >> In the case of mmc, this creates the possibility of a deadlock as >> other blk-mq is also run on the same queue. For example: >> >> - worker 0 claims the mmc host >> - worker 1 attempts to claim the host >> - worker 0 schedules complete_work to release the host >> - rescuer thread is triggered after time-out and collects the dangling >> work >> - rescuer thread attempts to complete the work in order starting with >> claim host >> - the task to release host is now blocked by a task to claim it and >> will never be called >> >> The above results in multiple hung tasks that lead to failures to boot. >> >> Switching complete_work to the system_highpri queue avoids this >> because system_highpri is not flagged with WQ_MEM_RECLAIM. This allows >> the host to be released without getting blocked by other claims tasks. >> > > Thanks for fix and the detailed description to the problem! > >> Signed-off-by: Zachary Hays <zhays@xxxxxxxxxxx> >> --- >> drivers/mmc/core/block.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c >> index aef1185f383d..59b6b41b84c6 100644 >> --- a/drivers/mmc/core/block.c >> +++ b/drivers/mmc/core/block.c >> @@ -2112,7 +2112,7 @@ static void mmc_blk_mq_req_done(struct mmc_request *mrq) >> if (waiting) >> wake_up(&mq->wait); >> else >> - kblockd_schedule_work(&mq->complete_work); >> + queue_work(system_highpri_wq, &mq->complete_work); > > Even if this solves the problem, I think we need some input from some > of the block experts/maintainers to understand if this is the correct > way to fix the problem. So, I have lopped them in. > > I am guessing MMC is not the only block device driver that have this > kind of locking issue. Or perhaps it is.. WRT kblockd_workqueue, there is also still this issue outstanding: https://lore.kernel.org/lkml/20170921140729.GA17333@xxxxxx/