On Wed, 2017-11-01 at 08:21 -0600, Jens Axboe wrote: > Fixed that up, and applied these two patches as well. Hello Jens, Recently I noticed that a test system sporadically hangs during boot (Dell PowerEdge R720 that boots from a hard disk connected to a MegaRAID SAS adapter) and also that srp-tests systematically hangs. Reverting the two patches from this series fixes both issues. I'm not sure there is another solution than reverting the two patches from this series. Bart. BTW, the following appeared in the kernel log when I tried to run srp-tests against a kernel with the two patches from this series applied: INFO: task kworker/19:1:209 blocked for more than 480 seconds. INFO: task kworker/19:1:209 blocked for more than 480 seconds. Tainted: G W 4.14.0-rc7-dbg+ #1 Tainted: G W 4.14.0-rc7-dbg+ #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/19:1 D 0 209 2 0x80000000 kworker/19:1 D 0 209 2 0x80000000 Workqueue: srp_remove srp_remove_work [ib_srp] Workqueue: srp_remove srp_remove_work [ib_srp] Call Trace: Call Trace: __schedule+0x2fa/0xbb0 __schedule+0x2fa/0xbb0 schedule+0x36/0x90 schedule+0x36/0x90 async_synchronize_cookie_domain+0x88/0x130 async_synchronize_cookie_domain+0x88/0x130 ? finish_wait+0x90/0x90 ? finish_wait+0x90/0x90 async_synchronize_full_domain+0x18/0x20 async_synchronize_full_domain+0x18/0x20 sd_remove+0x4d/0xc0 [sd_mod] sd_remove+0x4d/0xc0 [sd_mod] device_release_driver_internal+0x160/0x210 device_release_driver_internal+0x160/0x210 device_release_driver+0x12/0x20 device_release_driver+0x12/0x20 bus_remove_device+0x100/0x180 bus_remove_device+0x100/0x180 device_del+0x1d8/0x340 device_del+0x1d8/0x340 __scsi_remove_device+0xfc/0x130 __scsi_remove_device+0xfc/0x130 scsi_forget_host+0x25/0x70 scsi_forget_host+0x25/0x70 scsi_remove_host+0x79/0x120 scsi_remove_host+0x79/0x120 srp_remove_work+0x90/0x1d0 [ib_srp] srp_remove_work+0x90/0x1d0 [ib_srp] process_one_work+0x20a/0x660 process_one_work+0x20a/0x660 worker_thread+0x3d/0x3b0 worker_thread+0x3d/0x3b0 kthread+0x13a/0x150 kthread+0x13a/0x150 ? process_one_work+0x660/0x660 ? process_one_work+0x660/0x660 ? kthread_create_on_node+0x40/0x40 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x27/0x40 ret_from_fork+0x27/0x40 Showing all locks held in the system: Showing all locks held in the system: 1 lock held by khungtaskd/170: 1 lock held by khungtaskd/170: #0: (tasklist_lock){.+.+}, at: [<ffffffff810c125d>] debug_show_all_locks+0x3d/0x1a0 #0: (tasklist_lock){.+.+}, at: [<ffffffff810c125d>] debug_show_all_locks+0x3d/0x1a0 4 locks held by kworker/19:1/209: 4 locks held by kworker/19:1/209: #0: ("%s"("srp_remove")){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #0: ("%s"("srp_remove")){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #1: ((&target->remove_work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #1: ((&target->remove_work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #2: (&shost->scan_mutex){+.+.}, at: [<ffffffff814807bf>] scsi_remove_host+0x1f/0x120 #2: (&shost->scan_mutex){+.+.}, at: [<ffffffff814807bf>] scsi_remove_host+0x1f/0x120 #3: (&dev->mutex){....}, at: [<ffffffff814501a9>] device_release_driver_internal+0x39/0x210 #3: (&dev->mutex){....}, at: [<ffffffff814501a9>] device_release_driver_internal+0x39/0x210 2 locks held by kworker/u66:0/1927: 2 locks held by kworker/u66:0/1927: #0: ("events_unbound"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #0: ("events_unbound"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #1: ((&entry->work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #1: ((&entry->work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 2 locks held by kworker/5:0/2047: 2 locks held by kworker/5:0/2047: #0: ("kaluad"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #0: ("kaluad"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #1: ((&(&pg->rtpg_work)->work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #1: ((&(&pg->rtpg_work)->work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 ============================================= ============================================= INFO: task kworker/19:1:209 blocked for more than 480 seconds. INFO: task kworker/19:1:209 blocked for more than 480 seconds. Tainted: G W 4.14.0-rc7-dbg+ #1 Tainted: G W 4.14.0-rc7-dbg+ #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/19:1 D 0 209 2 0x80000000 kworker/19:1 D 0 209 2 0x80000000 Workqueue: srp_remove srp_remove_work [ib_srp] Workqueue: srp_remove srp_remove_work [ib_srp] Call Trace: Call Trace: __schedule+0x2fa/0xbb0 __schedule+0x2fa/0xbb0 schedule+0x36/0x90 schedule+0x36/0x90 async_synchronize_cookie_domain+0x88/0x130 async_synchronize_cookie_domain+0x88/0x130 ? finish_wait+0x90/0x90 ? finish_wait+0x90/0x90 async_synchronize_full_domain+0x18/0x20 async_synchronize_full_domain+0x18/0x20 sd_remove+0x4d/0xc0 [sd_mod] sd_remove+0x4d/0xc0 [sd_mod] device_release_driver_internal+0x160/0x210 device_release_driver_internal+0x160/0x210 device_release_driver+0x12/0x20 device_release_driver+0x12/0x20 bus_remove_device+0x100/0x180 bus_remove_device+0x100/0x180 device_del+0x1d8/0x340 device_del+0x1d8/0x340 __scsi_remove_device+0xfc/0x130 __scsi_remove_device+0xfc/0x130 scsi_forget_host+0x25/0x70 scsi_forget_host+0x25/0x70 scsi_remove_host+0x79/0x120 scsi_remove_host+0x79/0x120 srp_remove_work+0x90/0x1d0 [ib_srp] srp_remove_work+0x90/0x1d0 [ib_srp] process_one_work+0x20a/0x660 process_one_work+0x20a/0x660 worker_thread+0x3d/0x3b0 worker_thread+0x3d/0x3b0 kthread+0x13a/0x150 kthread+0x13a/0x150 ? process_one_work+0x660/0x660 ? process_one_work+0x660/0x660 ? kthread_create_on_node+0x40/0x40 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x27/0x40 ret_from_fork+0x27/0x40 Showing all locks held in the system: Showing all locks held in the system: 1 lock held by khungtaskd/170: 1 lock held by khungtaskd/170: #0: (tasklist_lock){.+.+}, at: [<ffffffff810c125d>] debug_show_all_locks+0x3d/0x1a0 #0: (tasklist_lock){.+.+}, at: [<ffffffff810c125d>] debug_show_all_locks+0x3d/0x1a0 4 locks held by kworker/19:1/209: 4 locks held by kworker/19:1/209: #0: ("%s"("srp_remove")){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #0: ("%s"("srp_remove")){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #1: ((&target->remove_work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #1: ((&target->remove_work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #2: (&shost->scan_mutex){+.+.}, at: [<ffffffff814807bf>] scsi_remove_host+0x1f/0x120 #2: (&shost->scan_mutex){+.+.}, at: [<ffffffff814807bf>] scsi_remove_host+0x1f/0x120 #3: (&dev->mutex){....}, at: [<ffffffff814501a9>] device_release_driver_internal+0x39/0x210 #3: (&dev->mutex){....}, at: [<ffffffff814501a9>] device_release_driver_internal+0x39/0x210 2 locks held by kworker/u66:0/1927: 2 locks held by kworker/u66:0/1927: #0: ("events_unbound"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #0: ("events_unbound"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #1: ((&entry->work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #1: ((&entry->work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 2 locks held by kworker/5:0/2047: 2 locks held by kworker/5:0/2047: #0: ("kaluad"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #0: ("kaluad"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #1: ((&(&pg->rtpg_work)->work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 #1: ((&(&pg->rtpg_work)->work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660 ============================================= =============================================