On 2021/08/30 11:40, Bart Van Assche wrote: > On 8/29/21 16:02, Damien Le Moal wrote: >> On 2021/08/27 23:34, Bart Van Assche wrote: >>> On 8/26/21 9:49 PM, Damien Le Moal wrote: >>>> So the mq-deadline priority patch reduces performance by nearly half at high QD. >>>> (*) Note: in all cases using the mq-deadline scheduler, for the first run at >>>> QD=1, I get this splat 100% of the time. >>>> >>>> [ 95.173889] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/0:1H:757] >>>> [ 95.292994] CPU: 0 PID: 757 Comm: kworker/0:1H Not tainted 5.14.0-rc7+ #1334 >>>> [ 95.307504] Workqueue: kblockd blk_mq_run_work_fn >>>> [ 95.312243] RIP: 0010:_raw_spin_unlock_irqrestore+0x35/0x40 >>>> [ 95.415904] Call Trace: >>>> [ 95.418373] try_to_wake_up+0x268/0x7c0 >>>> [ 95.422238] blk_update_request+0x25b/0x420 >>>> [ 95.426452] blk_mq_end_request+0x1c/0x120 >>>> [ 95.430576] null_handle_cmd+0x12d/0x270 [null_blk] >>>> [ 95.435485] blk_mq_dispatch_rq_list+0x13c/0x7f0 >>>> [ 95.443826] __blk_mq_do_dispatch_sched+0xb5/0x2f0 >>>> [ 95.448653] __blk_mq_sched_dispatch_requests+0xf4/0x140 >>>> [ 95.453998] blk_mq_sched_dispatch_requests+0x30/0x60 >>>> [ 95.459083] __blk_mq_run_hw_queue+0x49/0x90 >>>> [ 95.463377] process_one_work+0x26c/0x570 >>>> [ 95.467421] worker_thread+0x55/0x3c0 >>>> [ 95.475313] kthread+0x140/0x160 >>>> [ 95.482774] ret_from_fork+0x1f/0x30 >>> >>> I don't see any function names in the above call stack that refer to the >>> mq-deadline scheduler? Did I perhaps overlook something? Anyway, if you can >>> tell me how to reproduce this (kernel commit + kernel config) I will take a >>> look. >> >> Indeed, the stack trace does not show any mq-deadline function. But the >> workqueue is stuck on _raw_spin_unlock_irqrestore() in the blk_mq_run_work_fn() >> function. I suspect that the spinlock is dd->lock, so the CPU may be stuck on >> entry to mq-deadline dispatch or finish request methods. Not entirely sure. >> >> I got this splat with 5.4.0-rc7 (Linus tag patch) with the attached config. > > Hi Damien, > > Thank you for having shared the kernel configuration used in your test. > So far I have not yet been able to reproduce the above call trace in a > VM. Could the above call trace have been triggered by the mpt3sas driver > instead of the mq-deadline I/O scheduler? The above was triggered using nullblk with the test script you sent. I was not using drives on the HBA or AHCI when it happens. And I can reproduce this 100% of the time by running your script with QD=1. > > Thanks, > > Bart. > > -- Damien Le Moal Western Digital Research