Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests

Damien Le Moal <Damien.LeMoal@xxxxxxx> · Mon, 30 Aug 2021 21:42:54 +0000

On 2021/08/31 2:14, Bart Van Assche wrote:
> On 8/29/21 8:07 PM, Damien Le Moal wrote:
>> On 2021/08/30 11:40, Bart Van Assche wrote:
>>> On 8/29/21 16:02, Damien Le Moal wrote:
>>>> On 2021/08/27 23:34, Bart Van Assche wrote:
>>>>> On 8/26/21 9:49 PM, Damien Le Moal wrote:
>>>>>> So the mq-deadline priority patch reduces performance by nearly half at high QD.
>>>>>> (*) Note: in all cases using the mq-deadline scheduler, for the first run at
>>>>>> QD=1, I get this splat 100% of the time.
>>>>>>
>>>>>> [   95.173889] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/0:1H:757]
>>>>>> [   95.292994] CPU: 0 PID: 757 Comm: kworker/0:1H Not tainted 5.14.0-rc7+ #1334
>>>>>> [   95.307504] Workqueue: kblockd blk_mq_run_work_fn
>>>>>> [   95.312243] RIP: 0010:_raw_spin_unlock_irqrestore+0x35/0x40
>>>>>> [   95.415904] Call Trace:
>>>>>> [   95.418373]  try_to_wake_up+0x268/0x7c0
>>>>>> [   95.422238]  blk_update_request+0x25b/0x420
>>>>>> [   95.426452]  blk_mq_end_request+0x1c/0x120
>>>>>> [   95.430576]  null_handle_cmd+0x12d/0x270 [null_blk]
>>>>>> [   95.435485]  blk_mq_dispatch_rq_list+0x13c/0x7f0
>>>>>> [   95.443826]  __blk_mq_do_dispatch_sched+0xb5/0x2f0
>>>>>> [   95.448653]  __blk_mq_sched_dispatch_requests+0xf4/0x140
>>>>>> [   95.453998]  blk_mq_sched_dispatch_requests+0x30/0x60
>>>>>> [   95.459083]  __blk_mq_run_hw_queue+0x49/0x90
>>>>>> [   95.463377]  process_one_work+0x26c/0x570
>>>>>> [   95.467421]  worker_thread+0x55/0x3c0
>>>>>> [   95.475313]  kthread+0x140/0x160
>>>>>> [   95.482774]  ret_from_fork+0x1f/0x30
>>>>>
>>>>> I don't see any function names in the above call stack that refer to the
>>>>> mq-deadline scheduler? Did I perhaps overlook something? Anyway, if you can
>>>>> tell me how to reproduce this (kernel commit + kernel config) I will take a
>>>>> look.
>>>>
>>>> Indeed, the stack trace does not show any mq-deadline function. But the
>>>> workqueue is stuck on _raw_spin_unlock_irqrestore() in the blk_mq_run_work_fn()
>>>> function. I suspect that the spinlock is dd->lock, so the CPU may be stuck on
>>>> entry to mq-deadline dispatch or finish request methods. Not entirely sure.
>>>>
>>>> I got this splat with 5.4.0-rc7 (Linus tag patch) with the attached config.
>>>
>>> Hi Damien,
>>>
>>> Thank you for having shared the kernel configuration used in your test.
>>> So far I have not yet been able to reproduce the above call trace in a
>>> VM. Could the above call trace have been triggered by the mpt3sas driver
>>> instead of the mq-deadline I/O scheduler?
>>
>> The above was triggered using nullblk with the test script you sent. I was not
>> using drives on the HBA or AHCI when it happens. And I can reproduce this 100%
>> of the time by running your script with QD=1.
> 
> Hi Damien,
> 
> I rebuilt kernel v5.14-rc7 (tag v5.14-rc7) after having run git clean -f -d -x
> and reran my nullb iops test with the mq-deadline scheduler. No kernel complaints
> appeared in the kernel log. Next I enabled lockdep (CONFIG_PROVE_LOCKING=y) and
> reran the nullb iops test with mq-deadline as scheduler. Again zero complaints
> appeared in the kernel log. Next I ran a subset of the blktests test
> (./check -q block). All tests passed and no complaints appeared in the kernel log.
> 
> Please help with root-causing this issue by rerunning the test on your setup after
> having enabled lockdep.

OK. Will have a look again.

> 
> Thanks,
> 
> Bart.
> 

-- 
Damien Le Moal
Western Digital Research