On Tue, 2013-06-25 at 11:01 +0200, Bart Van Assche wrote: > On 06/25/13 00:27, James Bottomley wrote: > > On Mon, 2013-06-24 at 15:04 -0500, Mike Christie wrote: > >> On 06/24/2013 02:19 PM, James Bottomley wrote: > >>> On Wed, 2013-06-12 at 14:55 +0200, Bart Van Assche wrote: > >>>> A SCSI LLD may start cleaning up host resources as soon as > >>>> scsi_remove_host() returns. These host resources may be needed by > >>>> the LLD in an implementation of one of the eh_* functions. So if > >>>> one of the eh_* functions is in progress when scsi_remove_host() > >>>> is invoked, wait until the eh_* function has finished. Also, do > >>>> not invoke any of the eh_* functions after scsi_remove_host() has > >>>> started. > >>> > >>> We already have state guards for this, don't we? That's the > >>> SHOST_*_RECOVERY ones. When eh functions are active, the host > >>> transitions to a recovery state, so the wait could just wait on that > >>> state rather than implement an open coded counting semaphore. > >> > >> That seems better. For the sg_reset_provider case we just would have to > >> also wait on the tmf_in_progress bit. > > > > The simplest way is may just be to move the kthread_stop() from release > > to remove. That synchronously waits for the outstanding error handling > > to complete and the eh thread to stop. Perhaps the eh thread should > > also wait for tmf in progress before it dies? > > Regarding TMF that are in progress: my preference is to leave it to the > LLD to wait for any TMF in progress if necessary. At least with SRP over > RDMA it is possible to prevent receiving further TMF completion > notifications by closing the connection over which these TMF were sent. > > There is a difference though between moving the EH kthread_stop() call > and the patch at the start of this thread: moving the EH kthread_stop() > call does not prevent that an ioctl like SG_SCSI_RESET triggers an eh_* > callback after scsi_remove_host() has finished. However, the > scsi_begin_eh() / scsi_end_eh() functions do prevent that an ioctl can > cause an eh_* callback to be invoked after scsi_remove_device() finished. OK, but this doesn't tell me what you're trying to achieve. An eh function is allowable as long as the host hadn't had the release callback executed. That means you must have to have a reference to the device/host to execute the eh function, which is currently guaranteed for all invocations. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html