On 11/3/21 9:09 AM, Ming Lei wrote: > On Wed, Nov 03, 2021 at 09:03:02AM -0600, Jens Axboe wrote: >> On 11/3/21 8:57 AM, Ming Lei wrote: >>> On Wed, Nov 03, 2021 at 09:59:02PM +0800, Yi Zhang wrote: >>>> On Wed, Nov 3, 2021 at 7:59 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >>>>> >>>>> On 11/2/21 9:54 PM, Jens Axboe wrote: >>>>>> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@xxxxxxxxxx> wrote: >>>>>>> >>>>>>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote: >>>>>>>>> On 11/2/21 8:21 PM, Yi Zhang wrote: >>>>>>>>>>> >>>>>>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll >>>>>>>>>>> hopefully shine a bit of light on the issue. >>>>>>>>>>> >>>>>>>>> Hi Jens >>>>>>>>> >>>>>>>>> Here is the full log: >>>>>>>> >>>>>>>> Thanks! I think I see what it could be - can you try this one as well, >>>>>>>> would like to confirm that the condition I think is triggering is what >>>>>>>> is triggering. >>>>>>>> >>>>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>>>>>> index 07eb1412760b..81dede885231 100644 >>>>>>>> --- a/block/blk-mq.c >>>>>>>> +++ b/block/blk-mq.c >>>>>>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio) >>>>>>>> if (plug && plug->cached_rq) { >>>>>>>> rq = rq_list_pop(&plug->cached_rq); >>>>>>>> INIT_LIST_HEAD(&rq->queuelist); >>>>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >>>>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >>>>>>>> } else { >>>>>>>> struct blk_mq_alloc_data data = { >>>>>>>> .q = q, >>>>>>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio) >>>>>>>> bio_wouldblock_error(bio); >>>>>>>> goto queue_exit; >>>>>>>> } >>>>>>>> + WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV)); >>>>>>>> + WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV)); >>>>>>> >>>>>>> Hello Jens, >>>>>>> >>>>>>> I guess the issue could be the following code run without grabbing >>>>>>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx(). >>>>>>> >>>>>>> .rq_flags = q->elevator ? RQF_ELV : 0, >>>>>>> >>>>>>> then elevator is switched to real one from none, and check on q->elevator >>>>>>> becomes not consistent. >>>>>> >>>>>> Indeed, that’s where I was going with this. I have a patch, testing it >>>>>> locally but it’s getting late. Will send it out tomorrow. The nice >>>>>> benefit is that it allows dropping the weird ref get on plug flush, >>>>>> and batches getting the refs as well. >>>>> >>>>> Yi/Steffen, can you try pulling this into your test kernel: >>>>> >>>>> git://git.kernel.dk/linux-block for-next >>>>> >>>>> and see if it fixes the issue for you. Thanks! >>>> >>>> It still can be reproduced with the latest linux-block/for-next, here is the log >>>> >>>> fab2914e46eb (HEAD, new/for-next) Merge branch 'for-5.16/drivers' into for-next >>> >>> Hi Yi, >>> >>> Please try the following change: >>> >>> >>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>> index e1e64964a31b..eb634a9c61ff 100644 >>> --- a/block/blk-mq.c >>> +++ b/block/blk-mq.c >>> @@ -494,7 +494,6 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, >>> .q = q, >>> .flags = flags, >>> .cmd_flags = op, >>> - .rq_flags = q->elevator ? RQF_ELV : 0, >>> .nr_tags = 1, >>> }; >>> struct request *rq; >>> @@ -504,6 +503,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, >>> if (ret) >>> return ERR_PTR(ret); >>> >>> + data.rq_flags = q->elevator ? RQF_ELV : 0, >>> rq = __blk_mq_alloc_requests(&data); >>> if (!rq) >>> goto out_queue_exit; >>> @@ -524,7 +524,6 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, >>> .q = q, >>> .flags = flags, >>> .cmd_flags = op, >>> - .rq_flags = q->elevator ? RQF_ELV : 0, >>> .nr_tags = 1, >>> }; >>> u64 alloc_time_ns = 0; >>> @@ -551,6 +550,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, >>> ret = blk_queue_enter(q, flags); >>> if (ret) >>> return ERR_PTR(ret); >>> + data.rq_flags = q->elevator ? RQF_ELV : 0, >> >> Don't think that will compile, but I guess the point is that we can't do > > It can compile. s/,/; for the new assignments. >> this assignment before queue enter, in case we're in the midst of >> switching schedulers. Which is indeed a valid concern. > > Yeah, for scsi, real io sched is switched when adding disk, before > that, the passthrough command need to see consistent q->elevator. Yeah, I agree that the problem is most certainly there. Guess I'm just surprised the timing works out reliably, but it sure looks like it. -- Jens Axboe