VM Boot Hangs with Commit "Revert "scsi: core: run queue if SCSI device queue isn't ready and queue is idle"" on linux-5.4.y

Sherry Yang <sherry.yang@xxxxxxxxxx> · Mon, 24 Jul 2023 05:27:46 +0000

Hi Douglas,

We observed linux-stable v5.4 VM boot hangs, but probably only 1 in thousands of boots (less than 10,000 boots).  We started 16 VMs on a Bare Metal with loop reboots, I chose 10,000 boots as a threshold, and bisected it. After a painful bisection, I found the culprit commit 578c8f09c04b (“Revert scsi: core: run queue if SCSI device queue isnt ready and queue is idle”). This commit actually was merged to v5.8 the 1st time. It's a series of patch set (https://www.spinics.net/lists/linux-block/msg51866.html). Actually, in the 4-patch series, 2 of them have already been backported to linux-stable v5.4, but not at the same time:

1) ab3cee3762e5 (“blk-mq: In blk_mq_dispatch_rq_list() no budget is a reason to kick”) in tag v5.4.86
2) 578c8f09c04b (“Revert scsi: core: run queue if SCSI device queue isnt ready and queue is idle”) in tag v5.4.235, it’s backported as stable dependency for another commit 

	Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
	Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx>
	Acked-by: Martin K. Petersen <martin.petersen@xxxxxxxxxx>
	Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
	Stable-dep-of: c31e76bcc379 ("blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx”)
	Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

And I tried backporting the other 2 patches to v5.4, the issue is still reproducible.

I tested multiple kernels, the issue is not reproducible within 10,000 boots in the following kernels:
1) Linux v5.9
2) Linux v5.4.249 + revert of 578c8f09c04b (“Revert scsi: core: run queue if SCSI device queue isnt ready and queue is idle”)

Not exactly sure how this commit is affecting linux-stable v5.4, but I suspect some prerequisite commits are missing which lead to boot hangs on linux-stable v5.4 but not on higher releases. Could you take a look at this issue and share your insight? 

Here is the call trace:
[  369.850681] INFO: task systemd-udevd:313 blocked for more than 122 seconds.
[  369.852180]       Not tainted 5.4.248-master.20230608.el8.dev.x86_64 #1
[  369.853631] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.855406] systemd-udevd   D    0   313    282 0x80004106
[  369.856662] Call Trace:
[  369.857276]  __schedule+0x213/0x570
[  369.858017]  schedule+0x42/0xb0
[  369.858699]  io_schedule+0x45/0x70
[  369.859671]  wait_on_page_bit_common+0x131/0x380
[  369.860884]  ? find_get_entries+0x1a9/0x260
[  369.862011]  ? __filemap_set_wb_err+0x70/0x70
[  369.863239]  __lock_page+0x44/0x50
[  369.864122]  truncate_inode_pages_range+0x463/0x8a0
[  369.865516]  ? pagevec_lookup_range_tag+0x28/0x40
[  369.866891]  ? free_cpumask_var+0x9/0x10
[  369.867909]  ? mark_buffer_async_write+0x30/0x30
[  369.869019]  ? get_ksm_page+0xf6/0x210
[  369.869936]  ? free_cpumask_var+0x9/0x10
[  369.871020]  ? on_each_cpu_cond_mask+0xb1/0x130
[  369.872062]  truncate_inode_pages+0x15/0x20
[  369.872739]  __blkdev_put+0xa7/0x220
[  369.873408]  ? exit_mmap+0x121/0x1b0
[  369.874092]  blkdev_put+0x4e/0xe0
[  369.874674]  blkdev_close+0x26/0x30
[  369.875281]  __fput+0xcc/0x260
[  369.875787]  ____fput+0xe/0x10
[  369.876298]  task_work_run+0x8b/0xb0
[  369.876892]  do_exit+0x1ff/0x420
[  369.877414]  do_group_exit+0x3b/0xb0
[  369.877989]  get_signal+0x169/0x8b0
[  369.878549]  do_signal+0x2a/0x100
[  369.879130]  ? __vfs_read+0x29/0x40
[  369.879713]  ? vfs_read+0xaa/0x160
[  369.880281]  ? ksys_read+0x67/0xe0
[  369.880858]  prepare_exit_to_usermode+0x12b/0x1a0
[  369.881614]  do_syscall_64+0x8e/0x100
[  369.882199]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[  369.883056] RIP: 0033:0x7f6d466de8c2
[  369.883664] Code: Bad RIP value.
[  369.884217] RSP: 002b:00007fffffadb558 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  369.885433] RAX: fffffffffffffffc RBX: 000055b0b3c40ab0 RCX: 00007f6d466de8c2
[  369.886552] RDX: 0000000000000200 RSI: 000055b0b3c40ad8 RDI: 000000000000000f
[  369.887692] RBP: 000055b0b3c3dd20 R08: 000055b0b3c40ab0 R09: 0000000000000001
[  369.888842] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000004000
[  369.889963] R13: 0000000000000200 R14: 000055b0b3c3dd70 R15: 000055b0b3c40ac8
[  369.891529] INFO: task systemd-udevd:315 blocked for more than 122 seconds.
[  369.893182]       Not tainted 5.4.248-master.20230608.el8.dev.x86_64 #1
[  369.894557] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.896318] systemd-udevd   D    0   315    282 0x80004106
[  369.897596] Call Trace:
[  369.898197]  __schedule+0x213/0x570
[  369.899068]  schedule+0x42/0xb0
[  369.899813]  io_schedule+0x45/0x70
[  369.900640]  wait_on_page_bit_common+0x131/0x380
[  369.901711]  ? find_get_entries+0x1a9/0x260
[  369.902688]  ? __filemap_set_wb_err+0x70/0x70
[  369.903710]  __lock_page+0x44/0x50
[  369.904504]  truncate_inode_pages_range+0x463/0x8a0
[  369.905648]  ? find_get_pages_range_tag+0x7e/0x2d0
[  369.906777]  ? pagevec_lookup_range_tag+0x28/0x40
[  369.907881]  ? free_cpumask_var+0x9/0x10
[  369.908816]  ? mark_buffer_async_write+0x30/0x30
[  369.909836]  ? get_ksm_page+0xf0/0x210
[  369.910738]  ? __x64_sys_fsopen+0x160/0x160
[  369.911743]  ? free_cpumask_var+0x9/0x10
[  369.912694]  ? on_each_cpu_cond_mask+0xb1/0x130
[  369.913763]  truncate_inode_pages+0x15/0x20
[  369.914747]  __blkdev_put+0xa7/0x220
[  369.915565]  ? exit_mmap+0x121/0x1b0
[  369.916407]  blkdev_put+0x4e/0xe0
[  369.917163]  blkdev_close+0x26/0x30
[  369.917960]  __fput+0xcc/0x260
[  369.918669]  ____fput+0xe/0x10
[  369.919362]  task_work_run+0x8b/0xb0
[  369.920185]  do_exit+0x1ff/0x420
[  369.920953]  do_group_exit+0x3b/0xb0
[  369.921776]  get_signal+0x169/0x8b0
[  369.922597]  do_signal+0x2a/0x100
[  369.923466]  ? __vfs_read+0x29/0x40
[  369.924424]  ? vfs_read+0xaa/0x160
[  369.925256]  ? ksys_read+0x67/0xe0
[  369.926085]  prepare_exit_to_usermode+0x12b/0x1a0
[  369.927204]  do_syscall_64+0x8e/0x100
[  369.928063]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[  369.929238] RIP: 0033:0x7f6d466de8c2
[  369.930056] Code: Bad RIP value.
[  369.930799] RSP: 002b:00007fffffadb518 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  369.932496] RAX: fffffffffffffffc RBX: 000055b0b3c3dd30 RCX: 00007f6d466de8c2
[  369.934119] RDX: 0000000000000040 RSI: 000055b0b3c3dd58 RDI: 000000000000000e
[  369.935700] RBP: 000055b0b3c3dbe0 R08: 000055b0b3c3dd30 R09: 0000000000000001
[  369.937300] R10: 0000000000000001 R11: 0000000000000246 R12: 00000002e6df0000
[  369.938873] R13: 0000000000000040 R14: 000055b0b3c3dc30 R15: 000055b0b3c3dd48

Thanks,
Sherry