Hi, On Mon, Apr 20, 2020 at 1:23 AM John Garry <john.garry@xxxxxxxxxx> wrote: > > On 18/04/2020 03:43, Bart Van Assche wrote: > > On 2020-04-16 04:18, John Garry wrote: > >> If in blk_mq_dispatch_rq_list() we find no budget, then we break of the > >> dispatch loop, but the request may keep the driver tag, evaulated > >> in 'nxt' in the previous loop iteration. > >> > >> Fix by putting the driver tag for that request. > >> > >> Signed-off-by: John Garry <john.garry@xxxxxxxxxx> > >> > >> diff --git a/block/blk-mq.c b/block/blk-mq.c > >> index 8e56884fd2e9..a7785df2c944 100644 > >> --- a/block/blk-mq.c > >> +++ b/block/blk-mq.c > >> @@ -1222,8 +1222,10 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list, > >> rq = list_first_entry(list, struct request, queuelist); > >> > >> hctx = rq->mq_hctx; > >> - if (!got_budget && !blk_mq_get_dispatch_budget(hctx)) > >> + if (!got_budget && !blk_mq_get_dispatch_budget(hctx)) { > >> + blk_mq_put_driver_tag(rq); > >> break; > >> + } > >> > >> if (!blk_mq_get_driver_tag(rq)) { > >> /* > > > > Is this something that can only happen if q->mq_ops->queue_rq(hctx, &bd) > > returns another value than BLK_STS_OK, BLK_STS_RESOURCE and > > BLK_STS_DEV_RESOURCE? > > Right, as that case is handled in blk_mq_handle_dev_resource() > > If so, please add a comment in the source code > > that explains this. > > So important that we should now do this in an extra patch? > > > > > Is this perhaps a bug fix for 0bca799b9280 ("blk-mq: order getting > > budget and driver tag")? If so, please mention this and add Cc tags for > > the people who were Cc-ed on that patch. > > So it looks like 0bca799b9280 had a flaw, but I am not sure if anything > got broken there and worthy of stable backport. > > I found this issue while debugging Ming's blk-mq cpu hotplug patchset, > which I feel is ready to merge. > > Having said that, this nasty issue did take > 1 day for me to debug... > so let me know. As per the above conversation, presumably this should go to stable then for any kernel that has commit 0bca799b9280 ("blk-mq: order getting budget and driver tag")? For instance, I think 4.19 would be affected? When I picked it there I got a conflict due to not having commit ea4f995ee8b8 ("blk-mq: cache request hardware queue mapping") but I think it's just a context collision and easy to resolve. I'm no expert in the block code, but I posted my backport to 4.19 at <https://crrev.com/c/2163313>. I'm happy to send an email as a patch to the list too or double-check that someone else's conflict resolution matches mine. -Doug