Hi Douglas, We observed linux-stable v5.4 VM boot hangs, but probably only 1 in thousands of boots (less than 10,000 boots). We started 16 VMs on a Bare Metal with loop reboots, I chose 10,000 boots as a threshold, and bisected it. After a painful bisection, I found the culprit commit 578c8f09c04b (“Revert scsi: core: run queue if SCSI device queue isnt ready and queue is idle”). This commit actually was merged to v5.8 the 1st time. It's a series of patch set (https://www.spinics.net/lists/linux-block/msg51866.html). Actually, in the 4-patch series, 2 of them have already been backported to linux-stable v5.4, but not at the same time: 1) ab3cee3762e5 (“blk-mq: In blk_mq_dispatch_rq_list() no budget is a reason to kick”) in tag v5.4.86 2) 578c8f09c04b (“Revert scsi: core: run queue if SCSI device queue isnt ready and queue is idle”) in tag v5.4.235, it’s backported as stable dependency for another commit Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx> Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx> Acked-by: Martin K. Petersen <martin.petersen@xxxxxxxxxx> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> Stable-dep-of: c31e76bcc379 ("blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx”) Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx> And I tried backporting the other 2 patches to v5.4, the issue is still reproducible. I tested multiple kernels, the issue is not reproducible within 10,000 boots in the following kernels: 1) Linux v5.9 2) Linux v5.4.249 + revert of 578c8f09c04b (“Revert scsi: core: run queue if SCSI device queue isnt ready and queue is idle”) Not exactly sure how this commit is affecting linux-stable v5.4, but I suspect some prerequisite commits are missing which lead to boot hangs on linux-stable v5.4 but not on higher releases. Could you take a look at this issue and share your insight? Here is the call trace: [ 369.850681] INFO: task systemd-udevd:313 blocked for more than 122 seconds. [ 369.852180] Not tainted 5.4.248-master.20230608.el8.dev.x86_64 #1 [ 369.853631] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.855406] systemd-udevd D 0 313 282 0x80004106 [ 369.856662] Call Trace: [ 369.857276] __schedule+0x213/0x570 [ 369.858017] schedule+0x42/0xb0 [ 369.858699] io_schedule+0x45/0x70 [ 369.859671] wait_on_page_bit_common+0x131/0x380 [ 369.860884] ? find_get_entries+0x1a9/0x260 [ 369.862011] ? __filemap_set_wb_err+0x70/0x70 [ 369.863239] __lock_page+0x44/0x50 [ 369.864122] truncate_inode_pages_range+0x463/0x8a0 [ 369.865516] ? pagevec_lookup_range_tag+0x28/0x40 [ 369.866891] ? free_cpumask_var+0x9/0x10 [ 369.867909] ? mark_buffer_async_write+0x30/0x30 [ 369.869019] ? get_ksm_page+0xf6/0x210 [ 369.869936] ? free_cpumask_var+0x9/0x10 [ 369.871020] ? on_each_cpu_cond_mask+0xb1/0x130 [ 369.872062] truncate_inode_pages+0x15/0x20 [ 369.872739] __blkdev_put+0xa7/0x220 [ 369.873408] ? exit_mmap+0x121/0x1b0 [ 369.874092] blkdev_put+0x4e/0xe0 [ 369.874674] blkdev_close+0x26/0x30 [ 369.875281] __fput+0xcc/0x260 [ 369.875787] ____fput+0xe/0x10 [ 369.876298] task_work_run+0x8b/0xb0 [ 369.876892] do_exit+0x1ff/0x420 [ 369.877414] do_group_exit+0x3b/0xb0 [ 369.877989] get_signal+0x169/0x8b0 [ 369.878549] do_signal+0x2a/0x100 [ 369.879130] ? __vfs_read+0x29/0x40 [ 369.879713] ? vfs_read+0xaa/0x160 [ 369.880281] ? ksys_read+0x67/0xe0 [ 369.880858] prepare_exit_to_usermode+0x12b/0x1a0 [ 369.881614] do_syscall_64+0x8e/0x100 [ 369.882199] entry_SYSCALL_64_after_hwframe+0x5c/0xc1 [ 369.883056] RIP: 0033:0x7f6d466de8c2 [ 369.883664] Code: Bad RIP value. [ 369.884217] RSP: 002b:00007fffffadb558 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 369.885433] RAX: fffffffffffffffc RBX: 000055b0b3c40ab0 RCX: 00007f6d466de8c2 [ 369.886552] RDX: 0000000000000200 RSI: 000055b0b3c40ad8 RDI: 000000000000000f [ 369.887692] RBP: 000055b0b3c3dd20 R08: 000055b0b3c40ab0 R09: 0000000000000001 [ 369.888842] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000004000 [ 369.889963] R13: 0000000000000200 R14: 000055b0b3c3dd70 R15: 000055b0b3c40ac8 [ 369.891529] INFO: task systemd-udevd:315 blocked for more than 122 seconds. [ 369.893182] Not tainted 5.4.248-master.20230608.el8.dev.x86_64 #1 [ 369.894557] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.896318] systemd-udevd D 0 315 282 0x80004106 [ 369.897596] Call Trace: [ 369.898197] __schedule+0x213/0x570 [ 369.899068] schedule+0x42/0xb0 [ 369.899813] io_schedule+0x45/0x70 [ 369.900640] wait_on_page_bit_common+0x131/0x380 [ 369.901711] ? find_get_entries+0x1a9/0x260 [ 369.902688] ? __filemap_set_wb_err+0x70/0x70 [ 369.903710] __lock_page+0x44/0x50 [ 369.904504] truncate_inode_pages_range+0x463/0x8a0 [ 369.905648] ? find_get_pages_range_tag+0x7e/0x2d0 [ 369.906777] ? pagevec_lookup_range_tag+0x28/0x40 [ 369.907881] ? free_cpumask_var+0x9/0x10 [ 369.908816] ? mark_buffer_async_write+0x30/0x30 [ 369.909836] ? get_ksm_page+0xf0/0x210 [ 369.910738] ? __x64_sys_fsopen+0x160/0x160 [ 369.911743] ? free_cpumask_var+0x9/0x10 [ 369.912694] ? on_each_cpu_cond_mask+0xb1/0x130 [ 369.913763] truncate_inode_pages+0x15/0x20 [ 369.914747] __blkdev_put+0xa7/0x220 [ 369.915565] ? exit_mmap+0x121/0x1b0 [ 369.916407] blkdev_put+0x4e/0xe0 [ 369.917163] blkdev_close+0x26/0x30 [ 369.917960] __fput+0xcc/0x260 [ 369.918669] ____fput+0xe/0x10 [ 369.919362] task_work_run+0x8b/0xb0 [ 369.920185] do_exit+0x1ff/0x420 [ 369.920953] do_group_exit+0x3b/0xb0 [ 369.921776] get_signal+0x169/0x8b0 [ 369.922597] do_signal+0x2a/0x100 [ 369.923466] ? __vfs_read+0x29/0x40 [ 369.924424] ? vfs_read+0xaa/0x160 [ 369.925256] ? ksys_read+0x67/0xe0 [ 369.926085] prepare_exit_to_usermode+0x12b/0x1a0 [ 369.927204] do_syscall_64+0x8e/0x100 [ 369.928063] entry_SYSCALL_64_after_hwframe+0x5c/0xc1 [ 369.929238] RIP: 0033:0x7f6d466de8c2 [ 369.930056] Code: Bad RIP value. [ 369.930799] RSP: 002b:00007fffffadb518 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 369.932496] RAX: fffffffffffffffc RBX: 000055b0b3c3dd30 RCX: 00007f6d466de8c2 [ 369.934119] RDX: 0000000000000040 RSI: 000055b0b3c3dd58 RDI: 000000000000000e [ 369.935700] RBP: 000055b0b3c3dbe0 R08: 000055b0b3c3dd30 R09: 0000000000000001 [ 369.937300] R10: 0000000000000001 R11: 0000000000000246 R12: 00000002e6df0000 [ 369.938873] R13: 0000000000000040 R14: 000055b0b3c3dc30 R15: 000055b0b3c3dd48 Thanks, Sherry