Re: fc_remove_host / scsi_host_dev_release deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Joe Eykholt wrote:
Joe Eykholt wrote:
This is different from the 3-way deadlock I posted on 7/25
http://marc.info/?l=linux-scsi&m=124856078500527&w=2

A user process is closing an sd block dev file.
It holds the sd_ref_mutex and is in scsi_host_dev_release()
doing a destroy_workqueue() which needs the cpu_add_remove_lock.

The process that holds cpu_add_remove_lock is doing
fcoe_destroy (deleting an FCoE instance) which calls
fc_remove_host() which does destroy_workqueue() and waiting in
flush_cpu_workqueue() for a work item to finish.

The stuck work item is doing fc_rport_final_delete(), and
that's in device_release_driver() in sd_remove() which
is waiting for the sd_ref_mutex.

I'd like to hear some discussion about the right approach
to fix this.  There should be some clear rules about how
work items can destroy or flush other workqueues.
I'd like to be able to state the clear rule that's being
violated here.

Now it seems to me that that the rule is:

    A workqueue item may not destroy a workqueue if it
    is running on a workqueue that itself may be destroyed.


Rephrasing:

    If a workqueue may be destroyed, it's handlers
    may not destroy any workqueue.

That's because destroy_workqueue takes the cpu_add_remove lock
and, while holding it, does flush_cpu_workqueue().

If a work item on workqueue A destroy workqueue B,
and thread C destroys workqueue A, then  thread C
may get the cpu_add_remove lock first, work item on A
can wait forever for the lock, and thread C can
hang waiting for workqueue A to flush.

So we're violating that rule if we let fc_rport_final_delete()
do the destroy of the device's workqueue.   I still don't
have a good solution.  Maybe fc_host_remove() could flush
the workqueue before destroying it?

Maybe destroy_workqueue() should flush before grabbing
the cpu_add_remove_lock, but maybe that doesn't completely fix it.

    Joe

Sorry for the noise.

The rule above is true, but it's
not really what's causing the problem here, which is:

sd_ref_mutex is held during a destroy_workqueue() for
the shost->work_q.  It means sd_release must have been the
the last one to release the scsi_host shost_gendev.

Since fc_remove_host() is running, but it must be
for the other fcoe instance, not the same host.

sd_ref_mutex is also grabbed from a work item handler on
the fc_host fc_wq.

Since both these workqueues can be destroyed, its wrong
to hold sd_ref mutex while destroying shost->work_q.
Since it may also be grabbed in a work item that needs
to be finished before destroying its work queue.

So my second rule is:

    If a workqueue may be destroyed, it's handlers
    may not take a lock if that lock is
    ever held while destroying any workqueue.

These two rules are caused by the use of cpu_add_remove_lock.
Maybe that's the issue to work on?

The first rule is violated in fc_rport_final_delete()
and the second rule is violated for sd_ref_mutex also
in fc_rport_final_delete().

Would it be OK to run fc_rport_final_delete() on the
general event workqueue (non-destroyable)?  I'm not sure
that fixes it.

Is there a way to fix the violations, or should we manage
to change the rules somehow?  Does anyone find a problem
with the rules I've stated?

	Thanks,
	Joe


Perhaps the way cpu_add_remove_lock is used to in
destroy_workqueue() should be changed, or maybe
fc_rport_final_delete and fc_remove_host need to
be better about using workqueues somehow.

I do see a lock dependency warning about this, so that
system works.  I saw this once on a system based
on 2.6.31-rc4 under the current fcoe-next.git tree.

The following are the stacks I collected via /proc/*/stacks
and the lockdep warnings.

    Thanks,
    Joe


Name:   hald-probe-volu
State:  D (disk sleep)
cmd:    /usr/libexec/hald-probe-volume
wchan:  cpu_maps_update_begin

--- waiting cpu_add_remove_lock
--- has sd_ref_mutex in scsi_disk_put

[<ffffffff81042bd2>] cpu_maps_update_begin+0x12/0x14
[<ffffffff81052ee0>] destroy_workqueue+0x2b/0x9e
[<ffffffff8129ef4f>] scsi_host_dev_release+0x5a/0xbd
[<ffffffff81291941>] device_release+0x49/0x75
[<ffffffff811d39d8>] kobject_release+0x51/0x67
[<ffffffff811d4789>] kref_put+0x43/0x4f
[<ffffffff811d38e1>] kobject_put+0x47/0x4b
[<ffffffff812912f2>] put_device+0x12/0x14
[<ffffffffa004826b>] fc_rport_dev_release+0x18/0x24 [scsi_transport_fc]
[<ffffffff81291941>] device_release+0x49/0x75
[<ffffffff811d39d8>] kobject_release+0x51/0x67
[<ffffffff811d4789>] kref_put+0x43/0x4f
[<ffffffff811d38e1>] kobject_put+0x47/0x4b
[<ffffffff812912f2>] put_device+0x12/0x14
[<ffffffff812a4d44>] scsi_target_dev_release+0x1d/0x21
[<ffffffff81291941>] device_release+0x49/0x75
[<ffffffff811d39d8>] kobject_release+0x51/0x67
[<ffffffff811d4789>] kref_put+0x43/0x4f
[<ffffffff811d38e1>] kobject_put+0x47/0x4b
[<ffffffff812912f2>] put_device+0x12/0x14
[<ffffffff812a7709>] scsi_device_dev_release_usercontext+0x118/0x124
[<ffffffff81053894>] execute_in_process_context+0x2a/0x70
[<ffffffff812a75ef>] scsi_device_dev_release+0x17/0x19
[<ffffffff81291941>] device_release+0x49/0x75
[<ffffffff811d39d8>] kobject_release+0x51/0x67
[<ffffffff811d4789>] kref_put+0x43/0x4f
[<ffffffff811d38e1>] kobject_put+0x47/0x4b
[<ffffffff812912f2>] put_device+0x12/0x14
[<ffffffff8129dfd5>] scsi_device_put+0x3d/0x42
[<ffffffff812acb57>] scsi_disk_put+0x30/0x41
[<ffffffff812ad9c3>] sd_release+0x4d/0x54
[<ffffffff810f9ca0>] __blkdev_put+0xa7/0x16e
[<ffffffff810f9d57>] __blkdev_put+0x15e/0x16e
[<ffffffff810f9d72>] blkdev_put+0xb/0xd
[<ffffffff810f9dab>] blkdev_close+0x37/0x3c
[<ffffffff810d6450>] __fput+0x10d/0x1bb
[<ffffffff810d6516>] fput+0x18/0x1a
[<ffffffff810d35b0>] filp_close+0x67/0x72
[<ffffffff810d3660>] sys_close+0xa5/0xe4
[<ffffffff8100baeb>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff


Name:   fcoeadm
State:  D (disk sleep)
cmd:    fcoeadm -d eth4
wchan:  flush_cpu_workqueue

--- waits on cpu work to finish. -- fc_workqueue 28596
--- holds cpu_add_remove_lock

[<ffffffff81052dbd>] flush_cpu_workqueue+0x7b/0x87
[<ffffffff81052e33>] cleanup_workqueue_thread+0x6a/0xb8
[<ffffffff81052f18>] destroy_workqueue+0x63/0x9e
[<ffffffffa004a5a7>] fc_remove_host+0x148/0x171 [scsi_transport_fc]
[<ffffffffa03e48b2>] fcoe_if_destroy+0x1a6/0x1db [fcoe]
[<ffffffffa03e6001>] fcoe_destroy+0x86/0xad [fcoe]
[<ffffffff81055034>] param_attr_store+0x25/0x35
[<ffffffff81055089>] module_attr_store+0x21/0x25
[<ffffffff811279fe>] sysfs_write_file+0xe4/0x119
[<ffffffff810d5a12>] vfs_write+0xab/0x105
[<ffffffff810d5b30>] sys_write+0x47/0x6f
[<ffffffff8100baeb>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff


Name:   fc_wq_137
State:  D (disk sleep)
cmd:
wchan:  scsi_disk_get_from_dev

-- waits on sd_ref_mutex

[<ffffffff812acaf8>] scsi_disk_get_from_dev+0x1a/0x49
[<ffffffff812acfdd>] sd_shutdown+0x12/0x117
[<ffffffff812ad20f>] sd_remove+0x51/0x8a
[<ffffffff8129443e>] __device_release_driver+0x80/0xc9
[<ffffffff81294552>] device_release_driver+0x1e/0x2b
[<ffffffff81293ae6>] bus_remove_device+0xa8/0xc9
[<ffffffff812921d6>] device_del+0x13f/0x1ac
[<ffffffff812a7a58>] __scsi_remove_device+0x44/0x81
[<ffffffff812a7abb>] scsi_remove_device+0x26/0x33
[<ffffffff812a7b6d>] __scsi_remove_target+0x93/0xd7
[<ffffffff812a7c17>] __remove_child+0x1e/0x25
[<ffffffff81291a22>] device_for_each_child+0x38/0x6f
[<ffffffff812a7bec>] scsi_remove_target+0x3b/0x48
[<ffffffffa0049db7>] fc_starget_delete+0x21/0x25 [scsi_transport_fc]
[<ffffffffa0049eb1>] fc_rport_final_delete+0xf6/0x188 [scsi_transport_fc]
[<ffffffff810527ec>] worker_thread+0x1fa/0x30a
[<ffffffff81056c2d>] kthread+0x88/0x90
[<ffffffff8100cbfa>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff

 [ INFO: possible circular locking dependency detected ]
 2.6.31-rc4-rp9 #5
 -------------------------------------------------------
 hald-probe-volu/23829 is trying to acquire lock:
(cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81042bd2>] cpu_maps_update_begin+0x12/0x14

 but task is already holding lock:
(sd_ref_mutex){+.+.+.}, at: [<ffffffff812acb47>] scsi_disk_put+0x20/0x41

 which lock already depends on the new lock.


 the existing dependency chain (in reverse order) is:

 -> #4 (sd_ref_mutex){+.+.+.}:
        [<ffffffff8106700c>] __lock_acquire+0xa48/0xbd0
        [<ffffffff8106724d>] lock_acquire+0xb9/0xdd
        [<ffffffff8150d536>] __mutex_lock_common+0x48/0x328
        [<ffffffff8150d88c>] mutex_lock_nested+0x24/0x29
        [<ffffffff812acaf8>] scsi_disk_get_from_dev+0x1a/0x49
        [<ffffffff812acfdd>] sd_shutdown+0x12/0x117
        [<ffffffff812ad20f>] sd_remove+0x51/0x8a
        [<ffffffff8129443e>] __device_release_driver+0x80/0xc9
        [<ffffffff81294552>] device_release_driver+0x1e/0x2b
        [<ffffffff81293ae6>] bus_remove_device+0xa8/0xc9
        [<ffffffff812921d6>] device_del+0x13f/0x1ac
        [<ffffffff812a7a58>] __scsi_remove_device+0x44/0x81
        [<ffffffff812a7abb>] scsi_remove_device+0x26/0x33
        [<ffffffff812a7b6d>] __scsi_remove_target+0x93/0xd7
        [<ffffffff812a7c17>] __remove_child+0x1e/0x25
        [<ffffffff81291a22>] device_for_each_child+0x38/0x6f
        [<ffffffff812a7bec>] scsi_remove_target+0x3b/0x48
[<ffffffffa0049db7>] fc_starget_delete+0x21/0x25 [scsi_transport_fc] [<ffffffffa0049eb1>] fc_rport_final_delete+0xf6/0x188 [scsi_transport_fc]
        [<ffffffff810527ec>] worker_thread+0x1fa/0x30a
        [<ffffffff81056c2d>] kthread+0x88/0x90
        [<ffffffff8100cbfa>] child_rip+0xa/0x20
        [<ffffffffffffffff>] 0xffffffffffffffff

 -> #3 (&shost->scan_mutex){+.+.+.}:
        [<ffffffff8106700c>] __lock_acquire+0xa48/0xbd0
        [<ffffffff8106724d>] lock_acquire+0xb9/0xdd
        [<ffffffff8150d536>] __mutex_lock_common+0x48/0x328
        [<ffffffff8150d88c>] mutex_lock_nested+0x24/0x29
        [<ffffffff812a7ab3>] scsi_remove_device+0x1e/0x33
        [<ffffffff812a7b6d>] __scsi_remove_target+0x93/0xd7
        [<ffffffff812a7c17>] __remove_child+0x1e/0x25
        [<ffffffff81291a22>] device_for_each_child+0x38/0x6f
        [<ffffffff812a7bec>] scsi_remove_target+0x3b/0x48
[<ffffffffa0049db7>] fc_starget_delete+0x21/0x25 [scsi_transport_fc] [<ffffffffa0049eb1>] fc_rport_final_delete+0xf6/0x188 [scsi_transport_fc]
        [<ffffffff810527ec>] worker_thread+0x1fa/0x30a
        [<ffffffff81056c2d>] kthread+0x88/0x90
        [<ffffffff8100cbfa>] child_rip+0xa/0x20
        [<ffffffffffffffff>] 0xffffffffffffffff

 -> #2 (&rport->rport_delete_work){+.+.+.}:
        [<ffffffff8106700c>] __lock_acquire+0xa48/0xbd0
        [<ffffffff8106724d>] lock_acquire+0xb9/0xdd
        [<ffffffff810527e3>] worker_thread+0x1f1/0x30a
        [<ffffffff81056c2d>] kthread+0x88/0x90
        [<ffffffff8100cbfa>] child_rip+0xa/0x20
        [<ffffffffffffffff>] 0xffffffffffffffff

 -> #1 ((fc_host->work_q_name)){+.+.+.}:
        [<ffffffff8106700c>] __lock_acquire+0xa48/0xbd0
        [<ffffffff8106724d>] lock_acquire+0xb9/0xdd
        [<ffffffff81052e0e>] cleanup_workqueue_thread+0x45/0xb8
        [<ffffffff81052f18>] destroy_workqueue+0x63/0x9e
[<ffffffffa004a5a7>] fc_remove_host+0x148/0x171 [scsi_transport_fc]
        [<ffffffffa00738b2>] 0xffffffffa00738b2
        [<ffffffffa0075001>] 0xffffffffa0075001
        [<ffffffff81055034>] param_attr_store+0x25/0x35
        [<ffffffff81055089>] module_attr_store+0x21/0x25
        [<ffffffff811279fe>] sysfs_write_file+0xe4/0x119
        [<ffffffff810d5a12>] vfs_write+0xab/0x105
        [<ffffffff810d5b30>] sys_write+0x47/0x6f
        [<ffffffff8100baeb>] system_call_fastpath+0x16/0x1b
        [<ffffffffffffffff>] 0xffffffffffffffff

 -> #0 (cpu_add_remove_lock){+.+.+.}:
        [<ffffffff81066f00>] __lock_acquire+0x93c/0xbd0
        [<ffffffff8106724d>] lock_acquire+0xb9/0xdd
        [<ffffffff8150d536>] __mutex_lock_common+0x48/0x328
        [<ffffffff8150d88c>] mutex_lock_nested+0x24/0x29
        [<ffffffff81042bd2>] cpu_maps_update_begin+0x12/0x14
        [<ffffffff81052ee0>] destroy_workqueue+0x2b/0x9e
        [<ffffffff8129ef4f>] scsi_host_dev_release+0x5a/0xbd
        [<ffffffff81291941>] device_release+0x49/0x75
        [<ffffffff811d39d8>] kobject_release+0x51/0x67
        [<ffffffff811d4789>] kref_put+0x43/0x4f
        [<ffffffff811d38e1>] kobject_put+0x47/0x4b
        [<ffffffff812912f2>] put_device+0x12/0x14
[<ffffffffa004826b>] fc_rport_dev_release+0x18/0x24 [scsi_transport_fc]
        [<ffffffff81291941>] device_release+0x49/0x75
        [<ffffffff811d39d8>] kobject_release+0x51/0x67
        [<ffffffff811d4789>] kref_put+0x43/0x4f
        [<ffffffff811d38e1>] kobject_put+0x47/0x4b
        [<ffffffff812912f2>] put_device+0x12/0x14
        [<ffffffff812a4d44>] scsi_target_dev_release+0x1d/0x21
        [<ffffffff81291941>] device_release+0x49/0x75
        [<ffffffff811d39d8>] kobject_release+0x51/0x67
        [<ffffffff811d4789>] kref_put+0x43/0x4f
        [<ffffffff811d38e1>] kobject_put+0x47/0x4b
        [<ffffffff812912f2>] put_device+0x12/0x14
[<ffffffff812a7709>] scsi_device_dev_release_usercontext+0x118/0x124
        [<ffffffff81053894>] execute_in_process_context+0x2a/0x70
        [<ffffffff812a75ef>] scsi_device_dev_release+0x17/0x19
        [<ffffffff81291941>] device_release+0x49/0x75
        [<ffffffff811d39d8>] kobject_release+0x51/0x67
        [<ffffffff811d4789>] kref_put+0x43/0x4f
        [<ffffffff811d38e1>] kobject_put+0x47/0x4b
        [<ffffffff812912f2>] put_device+0x12/0x14
        [<ffffffff8129dfd5>] scsi_device_put+0x3d/0x42
        [<ffffffff812acb57>] scsi_disk_put+0x30/0x41
        [<ffffffff812ad9c3>] sd_release+0x4d/0x54
        [<ffffffff810f9ca0>] __blkdev_put+0xa7/0x16e
        [<ffffffff810f9d57>] __blkdev_put+0x15e/0x16e
        [<ffffffff810f9d72>] blkdev_put+0xb/0xd
        [<ffffffff810f9dab>] blkdev_close+0x37/0x3c
        [<ffffffff810d6450>] __fput+0x10d/0x1bb
        [<ffffffff810d6516>] fput+0x18/0x1a
        [<ffffffff810d35b0>] filp_close+0x67/0x72
        [<ffffffff810d3660>] sys_close+0xa5/0xe4
        [<ffffffff8100baeb>] system_call_fastpath+0x16/0x1b
        [<ffffffffffffffff>] 0xffffffffffffffff

 other info that might help us debug this:

 2 locks held by hald-probe-volu/23829:
#0: (&bdev->bd_mutex/1){+.+.+.}, at: [<ffffffff810f9c27>] __blkdev_put+0x2e/0x16e #1: (sd_ref_mutex){+.+.+.}, at: [<ffffffff812acb47>] scsi_disk_put+0x20/0x41

 stack backtrace:
 Pid: 23829, comm: hald-probe-volu Not tainted 2.6.31-rc4-rp9 #5
 Call Trace:
  [<ffffffff8106624e>] print_circular_bug_tail+0x71/0x7c
  [<ffffffff81066f00>] __lock_acquire+0x93c/0xbd0
  [<ffffffff8106724d>] lock_acquire+0xb9/0xdd
  [<ffffffff81042bd2>] ? cpu_maps_update_begin+0x12/0x14
  [<ffffffff8150d536>] __mutex_lock_common+0x48/0x328
  [<ffffffff81042bd2>] ? cpu_maps_update_begin+0x12/0x14
  [<ffffffff81065c1d>] ? trace_hardirqs_on+0xd/0xf
  [<ffffffff8150e64c>] ? _spin_unlock_irq+0x2b/0x30
  [<ffffffff81042bd2>] ? cpu_maps_update_begin+0x12/0x14
  [<ffffffff8150ce9e>] ? wait_for_common+0xf7/0x112
  [<ffffffff8103c93f>] ? default_wake_function+0x0/0xf
  [<ffffffff8150d88c>] mutex_lock_nested+0x24/0x29
  [<ffffffff81042bd2>] cpu_maps_update_begin+0x12/0x14
  [<ffffffff81052ee0>] destroy_workqueue+0x2b/0x9e
  [<ffffffff8129ef4f>] scsi_host_dev_release+0x5a/0xbd
  [<ffffffff81291941>] device_release+0x49/0x75
  [<ffffffff811d39d8>] kobject_release+0x51/0x67
  [<ffffffff811d3987>] ? kobject_release+0x0/0x67
  [<ffffffff811d4789>] kref_put+0x43/0x4f
  [<ffffffff811d38e1>] kobject_put+0x47/0x4b
  [<ffffffff812912f2>] put_device+0x12/0x14
  [<ffffffffa004826b>] fc_rport_dev_release+0x18/0x24 [scsi_transport_fc]
  [<ffffffff81291941>] device_release+0x49/0x75
  [<ffffffff811d39d8>] kobject_release+0x51/0x67
  [<ffffffff811d3987>] ? kobject_release+0x0/0x67
  [<ffffffff811d4789>] kref_put+0x43/0x4f
  [<ffffffff811d38e1>] kobject_put+0x47/0x4b
  [<ffffffff812912f2>] put_device+0x12/0x14
  [<ffffffff812a4d44>] scsi_target_dev_release+0x1d/0x21
  [<ffffffff81291941>] device_release+0x49/0x75
  [<ffffffff811d39d8>] kobject_release+0x51/0x67
  [<ffffffff811d3987>] ? kobject_release+0x0/0x67
  [<ffffffff811d4789>] kref_put+0x43/0x4f
  [<ffffffff811d38e1>] kobject_put+0x47/0x4b
  [<ffffffff812912f2>] put_device+0x12/0x14
  [<ffffffff812a7709>] scsi_device_dev_release_usercontext+0x118/0x124
  [<ffffffff812a75f1>] ? scsi_device_dev_release_usercontext+0x0/0x124
  [<ffffffff81053894>] execute_in_process_context+0x2a/0x70
  [<ffffffff812a75ef>] scsi_device_dev_release+0x17/0x19
  [<ffffffff81291941>] device_release+0x49/0x75
  [<ffffffff811d39d8>] kobject_release+0x51/0x67
  [<ffffffff811d3987>] ? kobject_release+0x0/0x67
  [<ffffffff811d4789>] kref_put+0x43/0x4f
  [<ffffffff811d38e1>] kobject_put+0x47/0x4b
  [<ffffffff812912f2>] put_device+0x12/0x14
  [<ffffffff8129dfd5>] scsi_device_put+0x3d/0x42
  [<ffffffff812acb57>] scsi_disk_put+0x30/0x41
  [<ffffffff812ad9c3>] sd_release+0x4d/0x54
  [<ffffffff810f9ca0>] __blkdev_put+0xa7/0x16e
  [<ffffffff810f9d57>] __blkdev_put+0x15e/0x16e
  [<ffffffff810f9d72>] blkdev_put+0xb/0xd
  [<ffffffff810f9dab>] blkdev_close+0x37/0x3c
  [<ffffffff810d6450>] __fput+0x10d/0x1bb
  [<ffffffff810d6516>] fput+0x18/0x1a
  [<ffffffff810d35b0>] filp_close+0x67/0x72
  [<ffffffff810d3660>] sys_close+0xa5/0xe4
  [<ffffffff8100baeb>] system_call_fastpath+0x16/0x1b

-- end
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux