Re: [PATCH V2 0/2] block: remove unnecessary RESTART

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 02, 2017 at 03:57:05PM +0000, Bart Van Assche wrote:
> On Wed, 2017-11-01 at 08:21 -0600, Jens Axboe wrote:
> > Fixed that up, and applied these two patches as well.
> 
> Hello Jens,
> 
> Recently I noticed that a test system sporadically hangs during boot (Dell
> PowerEdge R720 that boots from a hard disk connected to a MegaRAID SAS adapter)
> and also that srp-tests systematically hangs. Reverting the two patches from
> this series fixes both issues. I'm not sure there is another solution than
> reverting the two patches from this series.

Then we need to find the root cause, instead of using sort of workaround
as before.

For SCSI, the restart is always run from scsi_end_request(), and this
kind of restart from all hctx isn't necessary at all.

> 
> Bart.
> 
> 
> BTW, the following appeared in the kernel log when I tried to run srp-tests
> against a kernel with the two patches from this series applied:
> 
> INFO: task kworker/19:1:209 blocked for more than 480 seconds.
> INFO: task kworker/19:1:209 blocked for more than 480 seconds.
>       Tainted: G        W       4.14.0-rc7-dbg+ #1
>       Tainted: G        W       4.14.0-rc7-dbg+ #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/19:1    D    0   209      2 0x80000000
> kworker/19:1    D    0   209      2 0x80000000
> Workqueue: srp_remove srp_remove_work [ib_srp]
> Workqueue: srp_remove srp_remove_work [ib_srp]
> Call Trace:
> Call Trace:
>  __schedule+0x2fa/0xbb0
>  __schedule+0x2fa/0xbb0
>  schedule+0x36/0x90
>  schedule+0x36/0x90
>  async_synchronize_cookie_domain+0x88/0x130
>  async_synchronize_cookie_domain+0x88/0x130
>  ? finish_wait+0x90/0x90
>  ? finish_wait+0x90/0x90
>  async_synchronize_full_domain+0x18/0x20
>  async_synchronize_full_domain+0x18/0x20
>  sd_remove+0x4d/0xc0 [sd_mod]
>  sd_remove+0x4d/0xc0 [sd_mod]
>  device_release_driver_internal+0x160/0x210
>  device_release_driver_internal+0x160/0x210
>  device_release_driver+0x12/0x20
>  device_release_driver+0x12/0x20
>  bus_remove_device+0x100/0x180
>  bus_remove_device+0x100/0x180
>  device_del+0x1d8/0x340
>  device_del+0x1d8/0x340
>  __scsi_remove_device+0xfc/0x130
>  __scsi_remove_device+0xfc/0x130
>  scsi_forget_host+0x25/0x70
>  scsi_forget_host+0x25/0x70
>  scsi_remove_host+0x79/0x120
>  scsi_remove_host+0x79/0x120
>  srp_remove_work+0x90/0x1d0 [ib_srp]
>  srp_remove_work+0x90/0x1d0 [ib_srp]
>  process_one_work+0x20a/0x660
>  process_one_work+0x20a/0x660
>  worker_thread+0x3d/0x3b0
>  worker_thread+0x3d/0x3b0
>  kthread+0x13a/0x150
>  kthread+0x13a/0x150
>  ? process_one_work+0x660/0x660
>  ? process_one_work+0x660/0x660
>  ? kthread_create_on_node+0x40/0x40
>  ? kthread_create_on_node+0x40/0x40
>  ret_from_fork+0x27/0x40
>  ret_from_fork+0x27/0x40
> 
> Showing all locks held in the system:
> 
> Showing all locks held in the system:
> 1 lock held by khungtaskd/170:
> 1 lock held by khungtaskd/170:
>  #0:  (tasklist_lock){.+.+}, at: [<ffffffff810c125d>] debug_show_all_locks+0x3d/0x1a0
>  #0:  (tasklist_lock){.+.+}, at: [<ffffffff810c125d>] debug_show_all_locks+0x3d/0x1a0
> 4 locks held by kworker/19:1/209:
> 4 locks held by kworker/19:1/209:
>  #0:  ("%s"("srp_remove")){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #0:  ("%s"("srp_remove")){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #1:  ((&target->remove_work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #1:  ((&target->remove_work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #2:  (&shost->scan_mutex){+.+.}, at: [<ffffffff814807bf>] scsi_remove_host+0x1f/0x120
>  #2:  (&shost->scan_mutex){+.+.}, at: [<ffffffff814807bf>] scsi_remove_host+0x1f/0x120
>  #3:  (&dev->mutex){....}, at: [<ffffffff814501a9>] device_release_driver_internal+0x39/0x210
>  #3:  (&dev->mutex){....}, at: [<ffffffff814501a9>] device_release_driver_internal+0x39/0x210
> 2 locks held by kworker/u66:0/1927:
> 2 locks held by kworker/u66:0/1927:
>  #0:  ("events_unbound"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #0:  ("events_unbound"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #1:  ((&entry->work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #1:  ((&entry->work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
> 2 locks held by kworker/5:0/2047:
> 2 locks held by kworker/5:0/2047:
>  #0:  ("kaluad"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #0:  ("kaluad"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #1:  ((&(&pg->rtpg_work)->work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #1:  ((&(&pg->rtpg_work)->work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
> 
> 
> =============================================
> 
> =============================================
> 
> INFO: task kworker/19:1:209 blocked for more than 480 seconds.
> INFO: task kworker/19:1:209 blocked for more than 480 seconds.
>       Tainted: G        W       4.14.0-rc7-dbg+ #1
>       Tainted: G        W       4.14.0-rc7-dbg+ #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/19:1    D    0   209      2 0x80000000
> kworker/19:1    D    0   209      2 0x80000000
> Workqueue: srp_remove srp_remove_work [ib_srp]
> Workqueue: srp_remove srp_remove_work [ib_srp]
> Call Trace:
> Call Trace:
>  __schedule+0x2fa/0xbb0
>  __schedule+0x2fa/0xbb0
>  schedule+0x36/0x90
>  schedule+0x36/0x90
>  async_synchronize_cookie_domain+0x88/0x130
>  async_synchronize_cookie_domain+0x88/0x130
>  ? finish_wait+0x90/0x90
>  ? finish_wait+0x90/0x90
>  async_synchronize_full_domain+0x18/0x20
>  async_synchronize_full_domain+0x18/0x20
>  sd_remove+0x4d/0xc0 [sd_mod]
>  sd_remove+0x4d/0xc0 [sd_mod]
>  device_release_driver_internal+0x160/0x210
>  device_release_driver_internal+0x160/0x210
>  device_release_driver+0x12/0x20
>  device_release_driver+0x12/0x20
>  bus_remove_device+0x100/0x180
>  bus_remove_device+0x100/0x180
>  device_del+0x1d8/0x340
>  device_del+0x1d8/0x340
>  __scsi_remove_device+0xfc/0x130
>  __scsi_remove_device+0xfc/0x130
>  scsi_forget_host+0x25/0x70
>  scsi_forget_host+0x25/0x70
>  scsi_remove_host+0x79/0x120
>  scsi_remove_host+0x79/0x120
>  srp_remove_work+0x90/0x1d0 [ib_srp]
>  srp_remove_work+0x90/0x1d0 [ib_srp]
>  process_one_work+0x20a/0x660
>  process_one_work+0x20a/0x660
>  worker_thread+0x3d/0x3b0
>  worker_thread+0x3d/0x3b0
>  kthread+0x13a/0x150
>  kthread+0x13a/0x150
>  ? process_one_work+0x660/0x660
>  ? process_one_work+0x660/0x660
>  ? kthread_create_on_node+0x40/0x40
>  ? kthread_create_on_node+0x40/0x40
>  ret_from_fork+0x27/0x40
>  ret_from_fork+0x27/0x40
> 
> Showing all locks held in the system:
> 
> Showing all locks held in the system:
> 1 lock held by khungtaskd/170:
> 1 lock held by khungtaskd/170:
>  #0:  (tasklist_lock){.+.+}, at: [<ffffffff810c125d>] debug_show_all_locks+0x3d/0x1a0
>  #0:  (tasklist_lock){.+.+}, at: [<ffffffff810c125d>] debug_show_all_locks+0x3d/0x1a0
> 4 locks held by kworker/19:1/209:
> 4 locks held by kworker/19:1/209:
>  #0:  ("%s"("srp_remove")){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #0:  ("%s"("srp_remove")){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #1:  ((&target->remove_work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #1:  ((&target->remove_work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #2:  (&shost->scan_mutex){+.+.}, at: [<ffffffff814807bf>] scsi_remove_host+0x1f/0x120
>  #2:  (&shost->scan_mutex){+.+.}, at: [<ffffffff814807bf>] scsi_remove_host+0x1f/0x120
>  #3:  (&dev->mutex){....}, at: [<ffffffff814501a9>] device_release_driver_internal+0x39/0x210
>  #3:  (&dev->mutex){....}, at: [<ffffffff814501a9>] device_release_driver_internal+0x39/0x210
> 2 locks held by kworker/u66:0/1927:
> 2 locks held by kworker/u66:0/1927:
>  #0:  ("events_unbound"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #0:  ("events_unbound"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #1:  ((&entry->work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #1:  ((&entry->work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
> 2 locks held by kworker/5:0/2047:
> 2 locks held by kworker/5:0/2047:
>  #0:  ("kaluad"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #0:  ("kaluad"){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #1:  ((&(&pg->rtpg_work)->work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
>  #1:  ((&(&pg->rtpg_work)->work)){+.+.}, at: [<ffffffff81083b85>] process_one_work+0x195/0x660
> 
> 
> =============================================
> 
> =============================================

-- 
Ming



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux