On 12/20/18 11:01 AM, Bart Van Assche wrote: > On Thu, 2018-12-20 at 06:07 -0700, Jens Axboe wrote: >> On 12/20/18 6:02 AM, Jens Axboe wrote: >>>> I'm afraid this cannot work. >>>> >>>> The 'tags' here could be the hctx->sched_tags, but what we need to >>>> clear is hctx->tags->rqs[]. >>> >>> You are right, of course, a bit too quick on the trigger. This one >>> should work better, and also avoids that silly quadratic loop. I don't >>> think we need the tag == -1 check, but probably best to be safe. >> >> Sent out the wrong one, here it is. Bart, if you can reproduce, can you >> give it a whirl? >> >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index 2de972857496..fc04bb648f18 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -2025,7 +2025,7 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, >> { >> struct page *page; >> >> - if (tags->rqs && set->ops->exit_request) { >> + if (tags->rqs) { >> int i; >> >> for (i = 0; i < tags->nr_tags; i++) { >> @@ -2033,8 +2033,14 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, >> >> if (!rq) >> continue; >> - set->ops->exit_request(set, rq, hctx_idx); >> + if (set->ops->exit_request) >> + set->ops->exit_request(set, rq, hctx_idx); >> tags->static_rqs[i] = NULL; >> + >> + if (rq->tag == -1) >> + continue; >> + if (set->tags[hctx_idx]->rqs[rq->tag] == rq) >> + set->tags[hctx_idx]->rqs[rq->tag] = NULL; >> } >> } >> >> @@ -2113,6 +2119,7 @@ static int blk_mq_init_request(struct blk_mq_tag_set *set, struct request *rq, >> return ret; >> } >> >> + rq->tag = -1; >> WRITE_ONCE(rq->state, MQ_RQ_IDLE); >> return 0; >> } > > Hi Jens, > > Are you sure this is sufficient? No, I don't think it is. > My understanding is that > blk_mq_queue_tag_busy_iter() iterates over all tags in the tag set. So if the > request queue on which part_in_flight() is called and the request queue for > which blk_mq_free_rqs() is called share their tag set then part_in_flight() > and blk_mq_free_rqs() can run concurrently. That can cause ugly race > conditions. Do you think it would be a good idea to modify the inflight > accounting code such that it only considers the requests of a single request > queue instead of all requests for a given tag set? That would of course solve it, the question is how to do it. One option would be to have ->rqs[] be: struct rq_entry { struct request_queue *q; struct request *rq; }; instead of just a request, since then you could check the queue without having to dereference the request. The current race is inherent in that we set ->rqs[] AFTER having acquired the tag, so there's a window where you could find a stale entry. That's not normally an issue since requests are persistent, but for shared tag maps and queues disappearing it can pose a problem. -- Jens Axboe