Re: [PATCH v11 6/9] Make scsi_remove_host() wait until error handling finished

Michael Christie <michaelc@xxxxxxxxxxx> · Mon, 24 Jun 2013 21:56:41 -0500

On Jun 24, 2013, at 9:26 PM, Mike Christie <michaelc@xxxxxxxxxxx> wrote:

> On 06/24/2013 05:27 PM, James Bottomley wrote:
>>>> However, what's the reasoning behind wanting to do this?  In theory all
>>>> necessary resources for the eh thread should only be freed in the
>>>> release callback.  That means they aren't freed until all error recovery
>>>> completes.
>>> 
>>> I think it makes it easier to handle cleanup of driver resources
>>> needed
>>> for aborts/resets for some drivers. If after host removal, the scsi eh
>>> can call into the driver after scsi_remove_host is called then we have
>>> to set some internal bits to fail future eh callback calls and handle
>>> waiting on or flushing running eh operations. If we know that after
>>> scsi_host_remove is called the eh callbacks will not be running and
>>> will
>>> not be called we can just free the driver resources.
>>> 
>>> For iscsi and I think drivers that do scsi_remove_target it would be
>>> helpful to have something similar at the target level.
>> 
>> I'm wary of this because it combines two models: a definite state model
>> (where we move from state to state waiting for the completions) with an
>> event driven one (in theory the current one); such combinations rarely
>> lead to good things because you get a mixture of actions causing state
>> transitions some of which are waited for and some of which have an async
>> transition and that ends up confusing the heck out of everybody no
>> matter how carefully documented.  Can you give me some use cases of what
>> you're trying to achieve?  Could it be as simple as an event that fires
>> on release?
> 
> 
> The problem that we hit in iscsi is that we will call scsi_remove_target
> (we used to call scsi_remove_host when we always did a host per target
> so we hit the problem at that level before). That will complete, but the
> scsi eh might still be trying to abort/reset devices accessed through
> that target. To avoid freeing resources that the iscsi scsi eh might be
> using, we set internal state bits and wait on host_busy to go to zero
> before we tear down the iscsi session, conn and task structs.
> 
> I think Bart was hitting a similar issue but a level up in the host
> removal case, because srp always does a host per target and so it just
> does a scsi_remove_host..

I take this back. I don't think it is a issue anymore and I think I can remove the iscsi hack. With the blk_cleanup_queue/blk_drain_queue code I think the target and the removal of its devices will not complete until the scsi eh is completed. The blk_drain_queue code will now wait for the eh to complete because the rq counters will be incremented.--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html