On Thu, 2006-08-17 at 10:02 -0400, Salyzyn, Mark wrote: > None of this leads me to believe there is any kref node corruption, > but > code could expect that if a device existed at the nexus and the > subsystem acquired another reference to the node based on the nexus > rather than the scsi_device, thus using scsi_device_lookup, that they > would get an unexpected NULL pointer and choke. I have not inspected > the > code for such a path (yet), but feel we have risks in any case that > need > to be addressed. > > The aacraid driver should stop calling scsi_remove_device when an > array > is deleted ... or ... > > I believe what needs to be added is a check for sdev->sdev_state == > SDEV_DEL in __scsi_device_lookup_by_target and __scsi_device_lookup in > scsi.c: That would solve some of this... but that's not quite the whole of it, unfortunately. The error 1 problem is caused by the external namespace visibility, which the SDEV_DEL check will prevent (the device only goes into DEL after the namespace entry has been removed, so its now safe to repopulate it with a new device). However, there's a race you probably haven't come across yet where you do the same thing and get a device in SDEV_CANCEL. This means currently visible but dying (i.e. the only state it's going from CANCEL is DEL). To fix all of this, we probably need better state model checking in probe_and_add_lun() plus a parameter that says "wait and create a new device if this one is being removed" ... I suspect we can cannibalise the rescan parameter for that. James The three relevant states in the model are SDEV_CREATED (this means the device exists b - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html