+ Adrian, Linus On Tue, 5 Feb 2019 at 17:53, Zachary Hays <zhays@xxxxxxxxxxx> wrote: > > The kblockd workqueue is created with the WQ_MEM_RECLAIM flag set. > This generates a rescuer thread for that queue that will trigger when > the CPU is under heavy load and collect the uncompleted work. > > In the case of mmc, this creates the possibility of a deadlock as > other blk-mq work is also run on the same queue. For example: > > - worker 0 claims the mmc host > - worker 1 attempts to claim the host > - worker 0 schedules complete_work to release the host > - rescuer thread is triggered after time-out and collects the dangling > work > - rescuer thread attempts to complete the work in order starting with > claim host > - the task to release host is now blocked by a task to claim it and > will never be called > Adrian, I need your help to understand more of this. The above description doesn't add up to me. In principle, already when "worker 1 attempts to claim the host" as described above, it should succeed and should *not* need to wait for the host to be released. Right? The hole point with the commit 6c0cedd1ef95 ("mmc: core: Introduce host claiming by context"), was to allow the mmc host to be re-claimable for blk I/O requests, no matter from what worker/thread the claim/release is done from. Is it not working as expected you think? What am I missing here? > The above results in multiple hung tasks that lead to failures to boot. Of course, during boot there is also a card detect work running in parallel with blk I/O requests. This work is being punted to the "system_freezable_wq" and it claims/releases the host as well. Perhaps, that could have something to do with the problem.... > > Handling complete_work on a separate workqueue avoids this by keeping > the work completion tasks separate from the other blk-mq work. This > allows the host to be released without getting blocked by other tasks > attempting to claim the host. > > Signed-off-by: Zachary Hays <zhays@xxxxxxxxxxx> > --- > drivers/mmc/core/block.c | 10 +++++++++- > include/linux/mmc/card.h | 1 + > 2 files changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c > index aef1185f383d..14f3fdb8c6bb 100644 > --- a/drivers/mmc/core/block.c > +++ b/drivers/mmc/core/block.c > @@ -2112,7 +2112,7 @@ static void mmc_blk_mq_req_done(struct mmc_request *mrq) > if (waiting) > wake_up(&mq->wait); > else > - kblockd_schedule_work(&mq->complete_work); > + queue_work(mq->card->complete_wq, &mq->complete_work); > > return; > } > @@ -2924,6 +2924,13 @@ static int mmc_blk_probe(struct mmc_card *card) > > mmc_fixup_device(card, mmc_blk_fixups); > > + card->complete_wq = alloc_workqueue("mmc_complete", > + WQ_MEM_RECLAIM | WQ_HIGHPRI, 0); > + if (unlikely(!card->complete_wq)) { > + pr_err("Failed to create mmc completion workqueue"); > + return -ENOMEM; > + } > + > md = mmc_blk_alloc(card); > if (IS_ERR(md)) > return PTR_ERR(md); > @@ -2987,6 +2994,7 @@ static void mmc_blk_remove(struct mmc_card *card) > pm_runtime_put_noidle(&card->dev); > mmc_blk_remove_req(md); > dev_set_drvdata(&card->dev, NULL); > + destroy_workqueue(card->complete_wq); > } > > static int _mmc_blk_suspend(struct mmc_card *card) > diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h > index de7377815b6b..8ef330027b13 100644 > --- a/include/linux/mmc/card.h > +++ b/include/linux/mmc/card.h > @@ -308,6 +308,7 @@ struct mmc_card { > unsigned int nr_parts; > > unsigned int bouncesz; /* Bounce buffer size */ > + struct workqueue_struct *complete_wq; /* Private workqueue */ > }; > > static inline bool mmc_large_sector(struct mmc_card *card) > -- > 2.7.4 > So this change seems to solve the problem, which is really great. However, I think we really need to understand what goes wrong, before we apply this as fix. Unfortunate, I am also in busy period, so I need a couple of more days before I can help out running tests. Kind regards Uffe