On Thu, 7 Feb 2019 at 16:55, Zachary Hays <zhays@xxxxxxxxxxx> wrote: > > The kblockd workqueue is created with the WQ_MEM_RECLAIM flag set. > This generates a rescuer thread for that queue that will trigger when > the CPU is under heavy load and collect the uncompleted work. > > In the case of mmc, this creates the possibility of a deadlock when > there are multiple partitions on the device as other blk-mq work is > also run on the same queue. For example: > > - worker 0 claims the mmc host to work on partition 1 > - worker 1 attempts to claim the host for partition 2 but has to wait > for worker 0 to finish > - worker 0 schedules complete_work to release the host > - rescuer thread is triggered after time-out and collects the dangling > work > - rescuer thread attempts to complete the work in order starting with > claim host > - the task to release host is now blocked by a task to claim it and > will never be called > > The above results in multiple hung tasks that lead to failures to > mount partitions. > > Handling complete_work on a separate workqueue avoids this by keeping > the work completion tasks separate from the other blk-mq work. This > allows the host to be released without getting blocked by other tasks > attempting to claim the host. > > Signed-off-by: Zachary Hays <zhays@xxxxxxxxxxx> Applied for fixes and by adding a fixes+stable tag. Thanks Zachary for fixing this non-trivial problem! And thanks also to Adrian/Christoph for your valuable input to this (feel free to reply with your reviewed-by tag). Kind regards Uffe > --- > drivers/mmc/core/block.c | 10 +++++++++- > include/linux/mmc/card.h | 1 + > 2 files changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c > index aef1185f383d..14f3fdb8c6bb 100644 > --- a/drivers/mmc/core/block.c > +++ b/drivers/mmc/core/block.c > @@ -2112,7 +2112,7 @@ static void mmc_blk_mq_req_done(struct mmc_request *mrq) > if (waiting) > wake_up(&mq->wait); > else > - kblockd_schedule_work(&mq->complete_work); > + queue_work(mq->card->complete_wq, &mq->complete_work); > > return; > } > @@ -2924,6 +2924,13 @@ static int mmc_blk_probe(struct mmc_card *card) > > mmc_fixup_device(card, mmc_blk_fixups); > > + card->complete_wq = alloc_workqueue("mmc_complete", > + WQ_MEM_RECLAIM | WQ_HIGHPRI, 0); > + if (unlikely(!card->complete_wq)) { > + pr_err("Failed to create mmc completion workqueue"); > + return -ENOMEM; > + } > + > md = mmc_blk_alloc(card); > if (IS_ERR(md)) > return PTR_ERR(md); > @@ -2987,6 +2994,7 @@ static void mmc_blk_remove(struct mmc_card *card) > pm_runtime_put_noidle(&card->dev); > mmc_blk_remove_req(md); > dev_set_drvdata(&card->dev, NULL); > + destroy_workqueue(card->complete_wq); > } > > static int _mmc_blk_suspend(struct mmc_card *card) > diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h > index de7377815b6b..8ef330027b13 100644 > --- a/include/linux/mmc/card.h > +++ b/include/linux/mmc/card.h > @@ -308,6 +308,7 @@ struct mmc_card { > unsigned int nr_parts; > > unsigned int bouncesz; /* Bounce buffer size */ > + struct workqueue_struct *complete_wq; /* Private workqueue */ > }; > > static inline bool mmc_large_sector(struct mmc_card *card) > -- > 2.7.4 >