Patrick Mansfield wrote:
Do you have slab poisoning on (CONFIG_DEBUG_SLAB)?
No, not yet...
I reported the following problem, it looks like nodemgr had a similar patch to change list_for_each_safe to device_for_each_child, but device_for_each_child is not "safe", see this thread: http://marc.theaimsgroup.com/?t=111931541100002&r=1&w=2 With nothing more from Greg ... I think DEBUG_SLAB will catch any use after frees there. I haven't tried to run with *out* DEBUG_SLAB or analyze what might happen, so don't know the symptoms for fibre channel removal (the call in scsi_sysfs.c:scsi_remove_target()).
The patch you mention changed nodemgr_remove_host_dev which is called when a FireWire controller is removed AFAIU. But when a FireWire device is unplugged or switched off, a different code path is followed in nodemgr: static void nodemgr_suspend_ne(struct node_entry *ne) { struct class_device *cdev; struct unit_directory *ud; HPSB_DEBUG("Node suspended: ID:BUS[" NODE_BUS_FMT "] GUID[%016Lx]", NODE_BUS_ARGS(ne->host, ne->nodeid), (unsigned long long)ne->guid); ne->in_limbo = 1; device_create_file(&ne->device, &dev_attr_ne_in_limbo); down_write(&ne->device.bus->subsys.rwsem); list_for_each_entry(cdev, &nodemgr_ud_class.children, node) { ud = container_of(cdev, struct unit_directory, class_dev); if (ud->ne != ne) continue; if (ud->device.driver && (!ud->device.driver->suspend || ud->device.driver->suspend(&ud->device, PMSG_SUSPEND, 0))) device_release_driver(&ud->device); } up_write(&ne->device.bus->subsys.rwsem); } If I understand it correctly, the call of device_release_driver() leads to sbp2_remove() which calls scsi_remove_device() which, in case of RBC disks, seems to hang in sd_shutdown()/ sd_sync_cache()/ scsi_wait_req(). Since ne->device.bus->subsys.rwsem is down, all other FireWire device additions or removals cannot be served until device_release_driver() returned, even everything that happens on a second FireWire adapter. (I have two FireWire adapters, and the other knodemgrd_# never wakes up while the first knodemgrd_# is locked up.) May ieee1394's rwsem cause a deadlock in scsi's device removals? It would surprise me. -- Stefan Richter -=====-=-=-= -=== ===== http://arcgraph.de/sr/ - : send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html