Re: [bug report] WARNING: CPU: 1 PID: 1386 at block/blk-mq-sched.c:432 blk_mq_sched_insert_request+0x54/0x178

Jens Axboe <axboe@xxxxxxxxx> · Wed, 3 Nov 2021 05:59:41 -0600

On 11/2/21 9:54 PM, Jens Axboe wrote:
> On Nov 2, 2021, at 9:52 PM, Ming Lei <ming.lei@xxxxxxxxxx> wrote:
>>
>> On Tue, Nov 02, 2021 at 09:21:10PM -0600, Jens Axboe wrote:
>>>> On 11/2/21 8:21 PM, Yi Zhang wrote:
>>>>>>
>>>>>> Can either one of you try with this patch? Won't fix anything, but it'll
>>>>>> hopefully shine a bit of light on the issue.
>>>>>>
>>>> Hi Jens
>>>>
>>>> Here is the full log:
>>>
>>> Thanks! I think I see what it could be - can you try this one as well,
>>> would like to confirm that the condition I think is triggering is what
>>> is triggering.
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 07eb1412760b..81dede885231 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -2515,6 +2515,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>>    if (plug && plug->cached_rq) {
>>>        rq = rq_list_pop(&plug->cached_rq);
>>>        INIT_LIST_HEAD(&rq->queuelist);
>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>>    } else {
>>>        struct blk_mq_alloc_data data = {
>>>            .q        = q,
>>> @@ -2535,6 +2537,8 @@ void blk_mq_submit_bio(struct bio *bio)
>>>                bio_wouldblock_error(bio);
>>>            goto queue_exit;
>>>        }
>>> +        WARN_ON_ONCE(q->elevator && !(rq->rq_flags & RQF_ELV));
>>> +        WARN_ON_ONCE(!q->elevator && (rq->rq_flags & RQF_ELV));
>>
>> Hello Jens,
>>
>> I guess the issue could be the following code run without grabbing
>> ->q_usage_counter from blk_mq_alloc_request() and blk_mq_alloc_request_hctx().
>>
>> .rq_flags       = q->elevator ? RQF_ELV : 0,
>>
>> then elevator is switched to real one from none, and check on q->elevator
>> becomes not consistent.
> 
> Indeed, that’s where I was going with this. I have a patch, testing it
> locally but it’s getting late. Will send it out tomorrow. The nice
> benefit is that it allows dropping the weird ref get on plug flush,
> and batches getting the refs as well. 

Yi/Steffen, can you try pulling this into your test kernel:

git://git.kernel.dk/linux-block for-next

and see if it fixes the issue for you. Thanks!

-- 
Jens Axboe