Re: Unplugging of SBP-2 devices still does not work

Stefan Richter <stefanr@xxxxxxxxxxxxxxxxx> · Sun, 31 Jul 2005 20:48:05 +0200

Patrick Mansfield wrote:
Do you have slab poisoning on (CONFIG_DEBUG_SLAB)?

No, not yet...

I reported the following problem, it looks like nodemgr had a similar
patch to change list_for_each_safe to device_for_each_child, but
device_for_each_child is not "safe", see this thread:

http://marc.theaimsgroup.com/?t=111931541100002&r=1&w=2

With nothing more from Greg ...

I think DEBUG_SLAB will catch any use after frees there. I haven't tried
to run with *out* DEBUG_SLAB or analyze what might happen, so don't know
the symptoms for fibre channel removal (the call in
scsi_sysfs.c:scsi_remove_target()).

The patch you mention changed nodemgr_remove_host_dev which is
called when a FireWire controller is removed AFAIU. But when a
FireWire device is unplugged or switched off, a different code
path is followed in nodemgr:

static void nodemgr_suspend_ne(struct node_entry *ne)
{
	struct class_device *cdev;
	struct unit_directory *ud;

	HPSB_DEBUG("Node suspended: ID:BUS[" NODE_BUS_FMT "]  GUID[%016Lx]",
		   NODE_BUS_ARGS(ne->host, ne->nodeid), (unsigned long long)ne->guid);

	ne->in_limbo = 1;
	device_create_file(&ne->device, &dev_attr_ne_in_limbo);

	down_write(&ne->device.bus->subsys.rwsem);
	list_for_each_entry(cdev, &nodemgr_ud_class.children, node) {
		ud = container_of(cdev, struct unit_directory, class_dev);

		if (ud->ne != ne)
			continue;

		if (ud->device.driver &&
		    (!ud->device.driver->suspend ||
		      ud->device.driver->suspend(&ud->device, PMSG_SUSPEND, 0)))
			device_release_driver(&ud->device);
	}
	up_write(&ne->device.bus->subsys.rwsem);
}

If I understand it correctly, the call of device_release_driver()
leads to sbp2_remove() which calls scsi_remove_device() which, in
case of RBC disks, seems to hang in sd_shutdown()/ sd_sync_cache()/
scsi_wait_req().

Since ne->device.bus->subsys.rwsem is down, all other FireWire
device additions or removals cannot be served until
device_release_driver() returned, even everything that happens
on a second FireWire adapter. (I have two FireWire adapters, and
the other knodemgrd_# never wakes up while the first knodemgrd_#
is locked up.)

May ieee1394's rwsem cause a deadlock in scsi's device removals?
It would surprise me.
--
Stefan Richter
-=====-=-=-= -=== =====
http://arcgraph.de/sr/
-
: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html