On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote: >>> On 11/2/21 8:21 PM, Yi Zhang wrote: >>>>> >>>>> Can either one of you try with this patch? Won't fix anything, but it'll >>>>> hopefully shine a bit of light on the issue. >>>>> >>> Hi Jens >>> >>> Here is the full log: >> >> Thanks! I think I see what it could be - can you try this one as well, >> would like to confirm that the condition I think is triggering is what >> is triggering. >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index 07eb1412760b..81dede885231 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) >> if (plug && plug->cached_rq) { >> rq = rq_list_pop(&plug->cached_rq); >> INIT_LIST_HEAD(&rq->queuelist); >> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >> } else { >> struct blk_mq_alloc_data data = { >> .q = q, >> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) >> bio_wouldblock_error(bio); >> goto queue_exit; >> } >> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); > > Hello Jens, > > I guess the issue could be the following code run without grabbing > ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). > > .rq_flags = q->elevator ? RQF_ELV : 0, > > then elevator is switched to real one from none, and check on q->elevator > becomes not consistent. Indeed, that’s where I was going with this. I have a patch, testing it locally but it’s getting late. Will send it out tomorrow. The nice benefit is that it allows dropping the weird ref get on plug flush, and batches getting the refs as well.