Re: [PATCH v13] blk-mq: Rework blk-mq timeout handling again

Jens Axboe <axboe@xxxxxxxxx> · Tue, 22 May 2018 14:38:37 -0600

On 5/22/18 2:33 PM, Bart Van Assche wrote:
> On Tue, 2018-05-22 at 13:38 -0600, Jens Axboe wrote:
>> On 5/22/18 1:03 PM, Jens Axboe wrote:
>>> On 5/22/18 12:47 PM, Jens Axboe wrote:
>>>> Ran into this, running block/014 from blktests:
>>>>
>>>> [ 5744.949839] run blktests block/014 at 2018-05-22 12:41:25
>>>> [ 5750.723000] null: rq 00000000ff68f103 timed out
>>>> [ 5750.728181] WARNING: CPU: 45 PID: 2480 at block/blk-mq.c:585 __blk_mq_complete_request+0xa6/0x0
>>>> [ 5750.738187] Modules linked in: null_blk(+) configfs nvme nvme_core sb_edac x86_pkg_temp_therma]
>>>> [ 5750.765509] CPU: 45 PID: 2480 Comm: kworker/45:1H Not tainted 4.17.0-rc6+ #712
>>>> [ 5750.774087] Hardware name: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.3.4 11/09/2016
>>>> [ 5750.783369] Workqueue: kblockd blk_mq_timeout_work
>>>> [ 5750.789223] RIP: 0010:__blk_mq_complete_request+0xa6/0x110
>>>> [ 5750.795850] RSP: 0018:ffff883ffb417d68 EFLAGS: 00010202
>>>> [ 5750.802187] RAX: 0000000000000003 RBX: ffff881ff100d800 RCX: 0000000000000000
>>>> [ 5750.810649] RDX: ffff88407fd9e040 RSI: ffff88407fd956b8 RDI: ffff881ff100d800
>>>> [ 5750.819119] RBP: ffffe8ffffd91800 R08: 0000000000000000 R09: ffffffff82066eb8
>>>> [ 5750.827588] R10: ffff883ffa386138 R11: ffff883ffa385900 R12: 0000000000000001
>>>> [ 5750.836050] R13: ffff881fe7da6000 R14: 0000000000000020 R15: 0000000000000002
>>>> [ 5750.844529] FS:  0000000000000000(0000) GS:ffff88407fd80000(0000) knlGS:0000000000000000
>>>> [ 5750.854482] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 5750.861397] CR2: 00007ffc92f97f68 CR3: 000000000201d005 CR4: 00000000003606e0
>>>> [ 5750.869861] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [ 5750.878333] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> [ 5750.886805] Call Trace:
>>>> [ 5750.890033]  bt_iter+0x42/0x50
>>>> [ 5750.894000]  blk_mq_queue_tag_busy_iter+0x12b/0x220
>>>> [ 5750.899941]  ? blk_mq_tag_to_rq+0x20/0x20
>>>> [ 5750.904913]  ? __rcu_read_unlock+0x50/0x50
>>>> [ 5750.909978]  ? blk_mq_tag_to_rq+0x20/0x20
>>>> [ 5750.914948]  blk_mq_timeout_work+0x14b/0x240
>>>> [ 5750.920220]  process_one_work+0x21b/0x510
>>>> [ 5750.925197]  worker_thread+0x3a/0x390
>>>> [ 5750.929781]  ? process_one_work+0x510/0x510
>>>> [ 5750.934944]  kthread+0x11c/0x140
>>>> [ 5750.939028]  ? kthread_create_worker_on_cpu+0x50/0x50
>>>> [ 5750.945169]  ret_from_fork+0x1f/0x30
>>>> [ 5750.949656] Code: 48 02 00 00 80 e6 80 74 29 8b 95 80 00 00 00 44 39 e2 75 3b 48 89 df ff 90 2 
>>>> [ 5750.972139] ---[ end trace 40065cb1764bf500 ]---
>>>>
>>>> which is this:
>>>>
>>>>         WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_COMPLETE);
>>>
>>> That check looks wrong, since TIMED_OUT -> COMPLETE is also a valid
>>> state transition. So that check should be:
>>>
>>>         WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_COMPLETE &&
>>>                      blk_mq_rq_state(rq) != MQ_RQ_TIMED_OUT);
>>
>> I guess it would be cleaner to actually do the transition, in
>> blk_mq_rq_timed_out():
>>
>>         case BLK_EH_HANDLED:                                                    
>>                 if (blk_mq_change_rq_state(req, MQ_RQ_TIMED_OUT,                
>>                                                 MQ_RQ_COMPLETE))                
>>                         __blk_mq_complete_request(req);                         
>>                 break;        
>>
>> This works for me.
> 
> Hello Jens,
> 
> Thanks for having reported this. How about using the following change to suppress
> that warning:
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index bb99c03e7a34..84e55ea55baf 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -844,6 +844,7 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved)
>  
>  	switch (ret) {
>  	case BLK_EH_HANDLED:
> +		blk_mq_change_rq_state(req, MQ_RQ_TIMED_OUT, MQ_RQ_COMPLETE);
>  		__blk_mq_complete_request(req);
>  		break;
>  	case BLK_EH_RESET_TIMER:
> 
> I think this will work better than what was proposed in your last e-mail. I'm afraid
> that with that change that a completion that occurs while the timeout handler is
> running can be ignored.

What if that races with eg requeue? We get the completion from IRQ, decide we
need to requeue/restart the request rather than complete it.

-- 
Jens Axboe