Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook

Jens Axboe <axboe@xxxxxxxxx> · Wed, 14 Feb 2018 08:19:07 -0700

On 2/14/18 1:56 AM, Paolo Valente wrote:
> 
> 
>> Il giorno 14 feb 2018, alle ore 08:15, Mike Galbraith <efault@xxxxxx> ha scritto:
>>
>> On Wed, 2018-02-14 at 08:04 +0100, Mike Galbraith wrote:
>>>
>>> And _of course_, roughly two minutes later, IO stalled.
>>
>> P.S.
>>
>> crash> bt 19117
>> PID: 19117  TASK: ffff8803d2dcd280  CPU: 7   COMMAND: "kworker/7:2"
>> #0 [ffff8803f7207bb8] __schedule at ffffffff81595e18
>> #1 [ffff8803f7207c40] schedule at ffffffff81596422
>> #2 [ffff8803f7207c50] io_schedule at ffffffff8108a832
>> #3 [ffff8803f7207c60] blk_mq_get_tag at ffffffff8129cd1e
>> #4 [ffff8803f7207cc0] blk_mq_get_request at ffffffff812987cc
>> #5 [ffff8803f7207d00] blk_mq_alloc_request at ffffffff81298a9a
>> #6 [ffff8803f7207d38] blk_get_request_flags at ffffffff8128e674
>> #7 [ffff8803f7207d60] scsi_execute at ffffffffa0025b58 [scsi_mod]
>> #8 [ffff8803f7207d98] scsi_test_unit_ready at ffffffffa002611c [scsi_mod]
>> #9 [ffff8803f7207df8] sd_check_events at ffffffffa0212747 [sd_mod]
>> #10 [ffff8803f7207e20] disk_check_events at ffffffff812a0f85
>> #11 [ffff8803f7207e78] process_one_work at ffffffff81079867
>> #12 [ffff8803f7207eb8] worker_thread at ffffffff8107a127
>> #13 [ffff8803f7207f10] kthread at ffffffff8107ef48
>> #14 [ffff8803f7207f50] ret_from_fork at ffffffff816001a5
>> crash>
> 
> This has evidently to do with tag pressure.  I've looked for a way to
> easily reduce the number of tags online, so as to put your system in
> the bad spot deterministically.  But at no avail.  Does anyone know a
> way to do it?

The key here might be that it's not a regular file system request,
which I'm sure bfq probably handles differently. So it's possible
that you are slowly leaking those tags, and we end up in this
miserable situation after a while.

-- 
Jens Axboe