Alan Stern [stern@xxxxxxxxxxxxxxxxxxx] wrote: > On Wed, 15 Jun 2005, James Bottomley wrote: > > > On Wed, 2005-06-15 at 17:11 -0400, Alan Stern wrote: > > > This means that scsi_target_reap can be called and the __targets list > > > changed essentially at any time (subject only to the host_lock). Hence it > > > is impossible for scsi_forget_host to iterate through the list of targets > > > belonging to the host: While it is working to remove one target, the next > > > target on the list (stored in the tmp variable) might be removed by > > > another thread. > > > > It's no better nor worse than we already have. As has been said many > > times before, we need a proper host state model. > > > > > In fact there doesn't seem to be any safe way to remove all the targets > > > from a host. And what's to prevent scsi_target_reap being called twice > > > for the same target? > > > > The usage, if you look at the code ... it's alloc/reap or inc > > reap_ref/reap > > Okay, I understand a little better now. > > Would it hurt anything to change scsi_remove_host so that SHOST_DEL gets > set at the start, before scsi_forget_host is called? If that were done, > then we could make every pathway that adds a device acquire the scan_mutex > and fail if SHOST_DEL is already set. Thus no devices would ever be added > after forget_host had removed the existing ones, which can happen as > things stand. > > Furthermore, forget_host could be changed to loop over the __devices list > instead of the __targets list. The net effect would be the same: All the > devices on the host would be removed and the targets would automatically > get reaped by scsi_device_dev_release. There wouldn't be any need to > iterate over the targets at all. > > As a final change, the new loop-over-devices in forget_host and the one in > __scsi_remove_target should be made to skip over devices with SDEV_DEL > already set, and each time they call scsi_remove_device the loops should > restart from the beginning. This will eliminate the problem of the tmp > pointer being pulled out from under the code (even though it has > quadratic behavior). > > Do these changes sound okay? > I have something similar that I was testing since you mentioned the problem the other day. Our build machine went down so I will need to wait until tomorrow to get access to my patches for a post, if you have already rolled a patch we can compare notes when I post. What I did in the patch sequence was: 1.) Recode the scsi_host state model to align with the device state model (i.e. added scsi_host_set_state function and associated changes). 2.) Made shost cancel matched the scsi device state model and set this at the top of scsi_remove_host. Previous cancel code was not doing anything as the list was normally empty do to scsi_forget_host being called first. Also there where possible race conditions that you previously mentioned about canceling commands in this method. 3.) Added choke point checks under the scan_mutex to determine if scanning is allowed (scsi_host_scan_allowed). 4.) Added a target state model to match the scsi device state model. 5.) I use the STGT_CANCEL and STGT_DEL states in scsi_forget_host to skip over entries in the list as I am no longer using the _safe version. This looks like a good starting point with limited testing. I also need to more states to the target state model as I only added a few that I needed for the delete code. Anyway sorry about the delay. -andmike -- Michael Anderson andmike@xxxxxxxxxx - : send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html